There is a lot of transferable knowledge to gain from learning this stuff proper...

FiberBundle · on Dec 10, 2022

> My impression is that the field is more disciplined in terms of knowledge now than it was ~8 years ago - the fundamentals are better understood and more clearly expressed in literature.

Is that really true? That's not my impression at all (though to be fair I haven't been keeping up with current research as much as I used to). My understanding is that there is still hardly any knowledge on what deep learning models (and large language models in particular) actually learn. The loss surfaces are opaque, one still doesn't know why local minima reached by gradient descent tend to generalize fairly well for these models. The latent representations that generative language models learn is, with the exception of the occasional paper that finds some superficial correlations, hardly investigated at all and overall extremely poorly understood.

Very much interested in any references that contradict that sentiment.

samvher · on Dec 10, 2022

Maybe I'm biased specifically because of the book I mentioned. For me it's providing a theoretical basis for many things that earlier I learned in a hand-wavy way (e.g. way back I took Hinton's NN course and Ng's ML course, and learned about gradient descent, momentum, regularization, training/validation loss etc, and now with this book for the first time I feel like I get the bigger picture in the context of optimization/stats).

The previous version of this book was from 2012 though and I'm not 100% sure how much of the material in the current edition is new (there is definitely a _lot_ more deep learning stuff in it).

So yeah it could be that my impression is wrong, or that I made the scope at which it applies sound bigger than it is.

FiberBundle · on Dec 10, 2022

I've only read (parts of) the Murphy book from 2012, I assume you're reading this one: https://probml.github.io/pml-book/toc1.pdf?

Almost all of the content that the new book covers, with the exception of the third part on deep learning, is about theory that was almost exclusively invented before 2012. Classical ML (non deep learning) is actually very rigorous compared to modern ML. There exist good theorems (statistical learning theory) for most of the classical models I'm aware of.

samvher · on Dec 10, 2022

Yes that’s the one. Good to know, thanks for making me aware!