Hacker News new | past | comments | ask | show | jobs | submit login

Except that human generated doesn't really seem to matter, all that seems to matter is some basic guard rails on the data you choose. Meta has models generating training data then grading it and select the best examples to reincorporate into the training set, and it's improving benchmarks.



The problem with model collapse is reinforcing means at the costs of the edges of your distribution curve, particularly on repeat.

One of the things that is being overlooked is that offsetting the job loss from AI replacing mean work is that there's going to be new markets for edge case creation and curation.

Jackson Pollock and Hunter S Thompson for the AI generation with a primary audience of AI vs humans, sponsored by large tech and data companies like the new Renaissance Vatican.


That problem only exists as long as benchmarks don't sample problem space enough, and it can be quickly rectified once identified.


The industry has a much bigger issue with benchmarks and Goodhart's Law right now as it is. I'm skeptical benchmarks are the solution here in turn.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: