Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What do you think about the concept of "critical batch size"? https://openai.com/blog/science-of-ai/


I think the concept makes sense. The basic insight, that the right batch size depends on the difficulty and noisiness of a task, is already used by teams. For example, the PaLM paper from last week increased its batch size throughout training.

But as far as I know, the more precise predictions of optimal batch size aren't used much, probably because it's expensive to measure accurately, or because the predictive equation isn't accurate enough to begin with. I wonder if we can "transfer" the optimal batch size from a smaller setting (smaller model or data) to the full setting, like in our paper. This would make it much more practical.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: