They are both universal approximators. So are support vector machines, gaussian processes and gradient boosted trees. Yet the performance of neural networks is unrivaled in certain tasks, as has been proven over and over again.
As a whole, that paper is quite bad (and still unpublished, probably blocked by peer review) because (1) it only considers fully connected networks (which are a minority of models used nowadays) and (2) the experimental validation was done on tasks where neural networks are not very strong.
Show me examples of competitive polynomial regression models in language translation, image segmentation and Go playing and I will be convinced.
To be fair, most of the machine learning literature had no or a poor excuse for peer review. And many deep learning layers can be described by dense layers. For example, convolution.
You almost certainly can frame Go playing as polynomial regression but there would undoubtedly be numerical precision & other gradient issues. Deep learning is a practice is remarkably effective, no disagreement there.
As a whole, that paper is quite bad (and still unpublished, probably blocked by peer review) because (1) it only considers fully connected networks (which are a minority of models used nowadays) and (2) the experimental validation was done on tasks where neural networks are not very strong.
Show me examples of competitive polynomial regression models in language translation, image segmentation and Go playing and I will be convinced.