Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> for some reason, the structure of the network means that the estimated output learns realistic features first, and then overfits to the noise afterwords.

Why does this happen? What characteristics does the network structure have that cause this effect?



I think the reason this paper is so interesting is that no one had any idea why this happens




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: