Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

there are no other obvious mechanisms to explain why a smaller model with same or similar architecture would be better than a larger one.

Overfitting?



The consensus seems to be that the majority of LMs are undertrained not overfitting though.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: