Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I mean, I'm sure curated datasets that are guaranteed (within some margin of error, probably) to not have been AI-generated will be a valuable commodity, or rather already are. Maybe that would include old reddit posts/comments, not sure how far back you'd need to go. (You would know better than I).


An old Reddit post is how we ended up with glue on pizza.


But how can you know your curated dataset isn't curated by an AI?

After all the company providing it needs to make more money, and using people is costly.


The problem with Reddit posts is that they might be edited. But if you limit yourself to non-edited comments from before 2022 you’d be good.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: