Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can fit any data with enough parameters. What’s tricky is to constrain a model so that it approximates the ground truth well where there are no data points. If a family of functions is extremely flexible and can fit all kinds of data very efficiently I would argue it makes it harder for those functions to have correct values out of distribution.


Definitely. That's a fundamental observation called the bias-variance tradeoff. More flexible models are prone to overfitting, hitting each training point exactly with wild gyrations in between.

Big AI minimizes that problem by using more data. So much data that the model often only sees each data point once and overfitting is unlikely.


But while keeping the data constant, adding more and more parameters is a strategy that works, so what gives? Are the functions getting somehow regularized during training so effectively you could get away with fewer parameters, it's just that we don't have the right model just yet?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: