But who rates the synthetic data? If it is humans, I can understand that this is...

recursivecaveat · 2024-12-22T22:34:57 1734906897

Many things are more easily scored than produced. Like it's trivial to tell whether a poem rhymes, but writing one is a comparatively slow and difficult task. So hopefully since scoring is easier/more-discerning than generating, the idea is you can generate stuff, classify it as good or bad, and then retrain on the good stuff. It's kindof an article of faith for a lot of AI companies/professionals as well, since it prevents you from having to face a data wall, and is analogous to a human student practicing and learning in an appealing way.

As far as I know it doesn't work very well so far. It is prone to overfitting, where it ranks highly some trivial detail of the output eg "if a summary starts with a byline of the author its a sign of quality" and then starts looping on itself over and over, increasing the frequency and size of bylines until it's totally crommed off to infinity and just repeating a short phrase endlessly. Humans have good baselines and common sense that these ML systems lack, if you've ever seen one of those "deep dream" images it's the same kind of idea. The "most possible dog" image can be looks almost nothing like a dog in the same way that the "most possible poem" may look nothing like a poem.

ijustlovemath · 2024-12-22T22:30:12 1734906612

This is the bit I've never understood about training AI on its own output; won't you just regress to the mean?

astrange · 2024-12-23T07:52:08 1734940328

It's not trained on its own output. You can generate infinite correctly worked out math traces and train on those.