You're right, and here, they're including the prior R(x) in the model architecture. So R(x) = 0 here,
and they pose x = f_theta(z), where f_theta is a neural net.
So they can avoid overfitting by stopping the training at the right moment.
So they're not really trying to minimize E(x, x0), but rather to get a small enough value, so that they obtain good results. A too small value means overfitting.
and they pose x = f_theta(z), where f_theta is a neural net.
So they can avoid overfitting by stopping the training at the right moment.
So they're not really trying to minimize E(x, x0), but rather to get a small enough value, so that they obtain good results. A too small value means overfitting.