Basic Methods for Finding Zeroes and Mins / Maxes of Functions

montecarl · on March 27, 2020

The author points out that Newton's method needs a good initial guess. This is especially true if you have a function that is not well approximated by a quadratic polynomial. If, for example, you want your optimization to converge to a nearby minima (and no explode off towards infinity) a very simple fix is to implement a maximum step size that limits how far the next x value is from the current one.

This technique is used when optimizing atomic structures, where the energy changes rapidly as a function of distance between the atoms. Newton's method (or more typically, one of its approximations, like L-BFGS) without a maximum step size would just blast all the atoms apart, but with a maximum step size, it works extremely well.

This technique does require some domain knowledge. That is, what step size is "big", which is not always easy to know.

pixelpoet · on March 27, 2020

Brent's method is a nice combination (maintaining the robustness of bisection and approaching the speed of Newton) of several of these methods: https://en.wikipedia.org/wiki/Brent%27s_method

Of course the roots need to be isolated/bracketed first, and for this one can use https://en.wikipedia.org/wiki/Sturm's_theorem

lostmsu · on March 27, 2020

BTW, this is basically how all neural networks work too.

Der_Einzige · on March 27, 2020

Uhh, not all but maybe most in production. You can use any optimization technique you want on training the weights including things like evolutionary algorithms or simulated annealing which are entirely different from what's listed here. Evolutionary style methods may be SOTA for continuous control reinforcement learning problems... Consider how backprop or hill climbing or LBFG-S performs on something basic like cart pole

lostmsu · on March 27, 2020

Evolutionary algorithms are SOTA for continuous control? Never saw a paper claiming that. Which one of them can even compare to soft actor-critic?

I believe gradient descent is the king everywhere interesting right now (e.g. CV, NLP, RL).

Der_Einzige · on March 27, 2020

https://openai.com/blog/evolution-strategies/

lostmsu · on March 27, 2020

I have seen this paper. The blog post does not report, that they were able to train a better agent, than A3C. Only that ES allowed them to use more compute power to train in parallel.

WhiteNoiz3 · on March 27, 2020

I've been following this blog for a while and the author does a great job making complex math topics understandable for programmers. Recommended!

jansan · on March 27, 2020

There are also bracketing methods, where you first have to isolate roots to an interval and then approximate the root within that interval. Regula falsi and secant methods are examples of these and in my experience especially regula falsi works very well.