Actually, I read the batch norm paper and maybe I forgot important details, but ...

samvher · on Nov 29, 2019

I think you and the person you're responding to might have slightly different expectations behind what level of rigor counts as "math", just like how physicists and theoretical mathematicians often have somewhat different ideas about rigor.

My impression is that obviously ML is guided by math and people want to have an understanding of why some things converge and others don't. But "in the field" many people just mess around with different set-ups and see what works (especially in deep learning). Maybe theory follows to explain why it worked. I think you're right that a lot of progress in the field is based on intuition and some reasoning (e.g. trying something like an inception network) more than derivations that show that a particular set-up should be successful. I get the impression that most low-level components are pretty well understood, but when they are stacked and combined it gets more complicated.

I would be very curious to see a video of your cat solving MNIST in 1 hour!