Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most human authors are frankly far too stupid to be worth reading, even if they do put care into their work.

This, IMO, is the actual biggest problem with LLMs training on whatever the biggest text corpus us that's available: they don't account for the fact that not all text is equally worthy of next-token-predicting. This problem is completely solvable, almost trivially so, but I haven't seen anyone publicly describe a (scaled, in production) solution yet.



> This problem is completely solvable, almost trivially so, but I haven't seen anyone publicly describe a (scaled, in production) solution yet.

Can you explain your solution?


I imagine it looks something like "Censor all writing that contradicts my worldview"


It hardly matters what sources you are using if you filter it through something that has less understanding than a two year old, if any, no matter how eloquent it can express itself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: