Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're right, and here, they're including the prior R(x) in the model architecture. So R(x) = 0 here,

and they pose x = f_theta(z), where f_theta is a neural net.

So they can avoid overfitting by stopping the training at the right moment.

So they're not really trying to minimize E(x, x0), but rather to get a small enough value, so that they obtain good results. A too small value means overfitting.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: