I agree that ML tends to put weaker assumptions on the data than classical statistics and that it's a good thing.
However most ML certainly makes distributional assumptions - they are just weaker. When you're learning a huge deep net with an L2 loss on a regression task, you have a parametric conditional gaussian distribution under the hood. It's not because it's overparametrized that there's no distributional assumption. Vanilla autoencoders are also working under a multivariate gaussian setup as well. Most classifiers are trained under a multinomial distribution assumption etc.
And fat-tailed distributions are definitely a thing. It's just less of a concern for the mainstream CV problems on which people apply DL.
However most ML certainly makes distributional assumptions - they are just weaker. When you're learning a huge deep net with an L2 loss on a regression task, you have a parametric conditional gaussian distribution under the hood. It's not because it's overparametrized that there's no distributional assumption. Vanilla autoencoders are also working under a multivariate gaussian setup as well. Most classifiers are trained under a multinomial distribution assumption etc.
And fat-tailed distributions are definitely a thing. It's just less of a concern for the mainstream CV problems on which people apply DL.