There is a lot of transferable knowledge to gain from learning this stuff properly, even if you don’t expect to do core AI work in a commercial setting. Optimization, function fitting, probability/statistics, GPU programming…
My impression is that the field is more disciplined in terms of knowledge now than it was ~8 years ago - the fundamentals are better understood and more clearly expressed in literature.
Also there are still plenty of topics on which the new techniques can probably be fruitfully applied, especially if you have some domain knowledge that the math/CS PhDs don’t have.
For OP - I’m in a similar situation and have been going through Kevin Murphy’s “Probabilistic Machine Learning”, which is pretty massive and dense but also very lucid.
> My impression is that the field is more disciplined in terms of knowledge now than it was ~8 years ago - the fundamentals are better understood and more clearly expressed in literature.
Is that really true? That's not my impression at all (though to be fair I haven't been keeping up with current research as much as I used to). My understanding is that there is still hardly any knowledge on what deep learning models (and large language models in particular) actually learn. The loss surfaces are opaque, one still doesn't know why local minima reached by gradient descent tend to generalize fairly well for these models. The latent representations that generative language models learn is, with the exception of the occasional paper that finds some superficial correlations, hardly investigated at all and overall extremely poorly understood.
Very much interested in any references that contradict that sentiment.
Maybe I'm biased specifically because of the book I mentioned. For me it's providing a theoretical basis for many things that earlier I learned in a hand-wavy way (e.g. way back I took Hinton's NN course and Ng's ML course, and learned about gradient descent, momentum, regularization, training/validation loss etc, and now with this book for the first time I feel like I get the bigger picture in the context of optimization/stats).
The previous version of this book was from 2012 though and I'm not 100% sure how much of the material in the current edition is new (there is definitely a _lot_ more deep learning stuff in it).
So yeah it could be that my impression is wrong, or that I made the scope at which it applies sound bigger than it is.
Almost all of the content that the new book covers, with the exception of the third part on deep learning, is about theory that was almost exclusively invented before 2012. Classical ML (non deep learning) is actually very rigorous compared to modern ML. There exist good theorems (statistical learning theory) for most of the classical models I'm aware of.
My impression is that the field is more disciplined in terms of knowledge now than it was ~8 years ago - the fundamentals are better understood and more clearly expressed in literature.
Also there are still plenty of topics on which the new techniques can probably be fruitfully applied, especially if you have some domain knowledge that the math/CS PhDs don’t have.
For OP - I’m in a similar situation and have been going through Kevin Murphy’s “Probabilistic Machine Learning”, which is pretty massive and dense but also very lucid.