Dropout is roughly equivalent to layer-specific L2 regularization, and it's easy...

whiteandnerdy · 2025-03-18T10:42:11 1742294531

Wow! I think I dimly intuited your first paragraph already; I directionally get why your second might be true (although I'd have thought L1 was even more so, since it encourages zeros which is kind of like choosing a subspace).

Your third paragraph took me ages to get an intuition for - is the idea that regularisation penalises having "sharp elbows" at the join points of your hyper-spline thing? That's mind blowing and such an interesting way to think about what a ReLU layer is doing.

Thanks so much for a thought provoking comment, that's incredibly cool.