> training off of data generated by another AI is generally a bad idea Ah. So if... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

smitelli 4 months ago | parent | context | favorite | on: OpenAI says it has evidence DeepSeek used its mode...

> training off of data generated by another AI is generally a bad idea

Ah. So if I understand this... once the internet becomes completely overrun with AI-generated articles of no particular substance or importance, we should not bulk-scrape that internet again to train the subsequent generation of models.

I look forward to that day.

bangaladore 4 months ago | [–]

That's already happened. Its well established now that the internet is tainted. After essentially ChatGPT's public release, a non-insignificant amount of internet content is not written by humans.

tensor 4 months ago | [–]

Yes, this is a real and serious concern that AI researchers have.

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact