Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Don't forget that this model probably has far less params than o1 or even 4o. This is a compression/distillation, which means it frees up so much compute resources to build models much powerful than o1. At least this allows further scaling compute-wise (if not in the amount of, non-synthetic, source material available for training).



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: