I read a book on NNs by Robert Hecht Nielsen in 1989, during the NN hype of the ...

I read a book on NNs by Robert Hecht Nielsen in 1989, during the NN hype of the time (I believe it was the 2nd hype cycle, the first beginning with Rosenblatt’s original hardware perceptron and dying with Minsky and Pappert’s “Perceptrons” manuscript a decade or two earlier).

Everything described was laughably basic by modern standards, but the motivation given in that book was the Kolmogorov representation theorem: a modest 3 layer networks with the right activation function can represent any continuous m-to-n function.

Most research back then focused on 3 layer networks, possibly for that reason. Sigmoid activation was king, and vanishing gradients the main issue. It took 2 decades until AlexNet brought NN research back from the AI winter of the 1990’s