(All this from my rough, amateur understanding), SVMs are more or less equivalent to linear regression in a "feature space" and also equivalent to shallow neural network (~2-3). This means their size more or less increases with the amount of data they are attempting to approximate. And this means they don't do well scaling to truly huge data sets.
Deep nets pulled ahead of SVMs at the point people figured out how to train them on truly huge data sets using GPUs, gradient descent (and an ever increasing arsenal of further tricks - all the schemes together are mindboggling to read about).
This was basically because the deepness of a deep neural net means that it's size isn't as prone to increase with the size of data.
I don't really know why SVMs haven't been able to scale to a multi-layer approach though I know people have tried (someone has tried just about everything these days).
Part of the situation is leveraging simple code with GPUs still may be the most effective approach.
Close but not quite. The difference between (soft) SVM and a kernel linear classifier is choice of loss function; SVM minimizes hinge loss, linear regression minimizes squared loss.
(Choice of different loss functions will also give you Elastic Net, LASSO, logistic regression. From an engineering point of view I tend to think of the entire class as being different flavors of "stochastic gradient descent", in the spirit of Vowpal Wabbit etc.)
Deep nets pulled ahead of SVMs at the point people figured out how to train them on truly huge data sets using GPUs, gradient descent (and an ever increasing arsenal of further tricks - all the schemes together are mindboggling to read about).
This was basically because the deepness of a deep neural net means that it's size isn't as prone to increase with the size of data.
I don't really know why SVMs haven't been able to scale to a multi-layer approach though I know people have tried (someone has tried just about everything these days).
Part of the situation is leveraging simple code with GPUs still may be the most effective approach.