The author points out that Newton's method needs a good initial guess. This is especially true if you have a function that is not well approximated by a quadratic polynomial. If, for example, you want your optimization to converge to a nearby minima (and no explode off towards infinity) a very simple fix is to implement a maximum step size that limits how far the next x value is from the current one.
This technique is used when optimizing atomic structures, where the energy changes rapidly as a function of distance between the atoms. Newton's method (or more typically, one of its approximations, like L-BFGS) without a maximum step size would just blast all the atoms apart, but with a maximum step size, it works extremely well.
This technique does require some domain knowledge. That is, what step size is "big", which is not always easy to know.
Brent's method is a nice combination (maintaining the robustness of bisection and approaching the speed of Newton) of several of these methods: https://en.wikipedia.org/wiki/Brent%27s_method
Uhh, not all but maybe most in production. You can use any optimization technique you want on training the weights including things like evolutionary algorithms or simulated annealing which are entirely different from what's listed here. Evolutionary style methods may be SOTA for continuous control reinforcement learning problems... Consider how backprop or hill climbing or LBFG-S performs on something basic like cart pole
I have seen this paper. The blog post does not report, that they were able to train a better agent, than A3C. Only that ES allowed them to use more compute power to train in parallel.
There are also bracketing methods, where you first have to isolate roots to an interval and then approximate the root within that interval. Regula falsi and secant methods are examples of these and in my experience especially regula falsi works very well.
This technique is used when optimizing atomic structures, where the energy changes rapidly as a function of distance between the atoms. Newton's method (or more typically, one of its approximations, like L-BFGS) without a maximum step size would just blast all the atoms apart, but with a maximum step size, it works extremely well.
This technique does require some domain knowledge. That is, what step size is "big", which is not always easy to know.