Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A better way to think about synthetic data is to consider code. With code you can have an LLM generate code with tests, then confirm that the code compiles and the tests pass. Now you have semi-verified new code you can add to your training data, and training on that will help you get better results for code even though it was generated by a "less good" LLM.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: