Both Bayesian inference and deep learning can do function fitting, i.e. given a ...

Both Bayesian inference and deep learning can do function fitting, i.e. given a number of observations y and explanatory variables x, you try to find a function so that y ~ f(x). The function f can have few parameters (e.g. f(x)= ax+b for linear regression) or millions of parameters (the usual case for deep learning). You can try to find the best value for each of these parameters, or admit that each parameter has some uncertainty and try to infer a distribution for it. The first approach uses optimization, and in the last decade, that's done via various flavors of gradient descent. The second uses Monte Carlo. When you have few parameters, gradient descent is smoking fast. Above a number of parameters (which is surprisingly small, let's say about 100), gradient descent fails to converge to the optimum, but in many cases gets to a place that is "good enough". Good enough to make the practical applications useful. In pretty much all cases though, Bayesian inference via MCMC is painfully slow compared to gradient descent.

But there is a case where it makes sense: when you have reasonably few parameters, and you can understand their meaning. And this is exactly the case of what's called "statistical models". That's why STAN is called a statistical modeling language.

How is that? Gradient descent for these small'ish models is just MLE (maximum likelihood estimation). People have been doing MLE for 100 years, and they understand the ins and outs of MLE. There are some models that are simply unsuited for MLE; their likelihood function is called "singular"; there are places where the likelihood becomes infinite despite the fit being quite poor. One way to fix that is to "regularize" the problem, i.e. to add some artificial penalty that does not allow the reward function to become infinite. But this regularization is often subjective. You never know when the penalty you add is small enough to not alter the final fit. Another way is to do Bayesian inference . It's very slow, but you don't get pulled towards the singular parameters.