Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But who rates the synthetic data? If it is humans, I can understand that this is another way to get human knowledge into it, but if it's rated by AI, isn't it just a convoluted way of copying the rating AI's knowledge?



Many things are more easily scored than produced. Like it's trivial to tell whether a poem rhymes, but writing one is a comparatively slow and difficult task. So hopefully since scoring is easier/more-discerning than generating, the idea is you can generate stuff, classify it as good or bad, and then retrain on the good stuff. It's kindof an article of faith for a lot of AI companies/professionals as well, since it prevents you from having to face a data wall, and is analogous to a human student practicing and learning in an appealing way.

As far as I know it doesn't work very well so far. It is prone to overfitting, where it ranks highly some trivial detail of the output eg "if a summary starts with a byline of the author its a sign of quality" and then starts looping on itself over and over, increasing the frequency and size of bylines until it's totally crommed off to infinity and just repeating a short phrase endlessly. Humans have good baselines and common sense that these ML systems lack, if you've ever seen one of those "deep dream" images it's the same kind of idea. The "most possible dog" image can be looks almost nothing like a dog in the same way that the "most possible poem" may look nothing like a poem.


This is the bit I've never understood about training AI on its own output; won't you just regress to the mean?


It's not trained on its own output. You can generate infinite correctly worked out math traces and train on those.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: