I'm on board with all of this, I think even before GANs it was becoming popular to optimize loss that wasn't necessarily a log likelihood.
But I'm confused by the usage of the phrase generative model, which I took to always mean a probabilistic model of the joint that can be sampled over. I get that GANs generate data samples, but it seems different.
This is the problem when people use technical terms loosely and interchangeably with their English definitions. Generative model classifiers are precisely as you describe. They model a joint distribution that one can sample.
GANs cannot even fit this definition because it is not a classifier. It is composed of a generator and a discriminator. The discriminator is a discriminative classifier. The generator is, well, a generator. It has nothing to do with generative model classifiers. Then you get some variation of neural network generator > model that generates > generative model. This leads to confusion.
Now, our model also describes a distribution p^θ(x)\hat{p}_{\theta}(x)p^ θ (x) (green) that is defined implicitly by taking points from a unit Gaussian distribution (red) and mapping them through a (deterministic) neural network — our generative model (yellow). Our network is a function with parameters θ\thetaθ, and tweaking these parameters will tweak the generated distribution of images. Our goal then is to find parameters θ\thetaθ that produce a distribution that closely matches the true data distribution (for example, by having a small KL divergence loss). Therefore, you can imagine the green distribution starting out random and then the training process iteratively changing the parameters θ\thetaθ to stretch and squeeze it to better match the blue distribution.
This is precisely a generative model in the probabilistic sense. The section on VAEs spells this out even more explicitly:
For example, Variational Autoencoders allow us to perform both learning and efficient Bayesian inference in sophisticated probabilistic graphical models with latent variables (e.g. see DRAW, or Attend Infer Repeat for hints of recent relatively complex models).
The issue with GANs is that - while they model the joint probability of the input space - they aren't (easily) inspectable in the sense you can't get any understanding of how inputs relate to outputs. This means they appear different to traditional generative models where this is usually a goal.
But I'm confused by the usage of the phrase generative model, which I took to always mean a probabilistic model of the joint that can be sampled over. I get that GANs generate data samples, but it seems different.