*there are no other obvious mechanisms to explain why a smaller model with same ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		WithinReason on Jan 11, 2023 \| parent \| context \| favorite \| on: NanoGPT there are no other obvious mechanisms to explain why a smaller model with same or similar architecture would be better than a larger one. Overfitting?

Der_Einzige on Jan 11, 2023 [–]

The consensus seems to be that the majority of LMs are undertrained not overfitting though.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact