Hacker News new | past | comments | ask | show | jobs | submit login

There is a lot of unpublished work on how to train models. A lot of work is cleaning up the data or making synthetic data. This the secret sauce. It was demonstrated by TinyStories and Phi-X and now the recent work on small data for math reasoning.



There's a huge effort going into understanding the statistical information in a large corpus of text especially after people have shown you can reduce the language input needed to carefully selected sources which guarantee enough information for training.

The smaller the input for the same quality the quicker/better/faster we can iterate so everyone is pushing to get the minimum viable training time of a decent llm down to allow both ChainOfThought to get cheaper as a concept and to allow for iteration and innovation.

As long as we live in the future aspoused by early OpenAI of huge models on huge GPUs we were going to stagnate. More GPU always means better in this game, but smaller faster models means you can do even more with even less. Now the major players see the innovation heading into the multi llm instance arena which is still dominated by who has the best training and hardware. But I expect to see disruption there too in time.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: