I am still learning about Bayesian inference so this might be off-base but isn't...

I am still learning about Bayesian inference so this might be off-base but isn't the point to compute the full posterior distribution (or an approximation thereof) of the underlying parameters. Whether this is done in the context of a linear model or a deep neural network is a question of tractability.

The other distinction is between discriminative and generative models. In a discriminative model, the output/label is being predicted based on the input features: p(y|x, theta). For example, the probability of an image containing a dog, y based on pixels, x. Theta here refers to the parameters one needs to discover.

In a generative model, one instead models the distribution p(x|y, beta) i.e. given the label, say dog, predicting the joint distribution of all the images.

Neural networks with backproagation can be used for both discriminative and generative models. Bayesian methods can be applied to both discriminative and generative models to compute the full posterior distribution of the parameters, theta and beta.

Edit for clarity: The claim is that the choice of the model vs the choice of inferential methodology (Bayesian vs max likelihood for example) are orthogonal choices.

A neural network doing (discriminative) binary classification based on cross-entropy is maximizing likelihood instead of maximizing the posterior. Most Bayesian examples seem to specify a generative model (a Hidden Markov Model for example) and then infer the posterior. But there's nothing preventing one from using Bayesian methods with discriminative models (generalized linear models) or max likelihood with generative models.