They've obviously been thinking about this for a while and are well aware of the...

They've obviously been thinking about this for a while and are well aware of the pitfalls of training on AI based content. This is why they're making such aggressive moves into video, audio, other better and more robust ground forms of truth. Do you really think that they aren't aware of this issue?

It's funny whenever people bring this up, they think AI companies are some mindless juggernauts who will simply train without caring about data quality at all and end up with worse models that they'll still for some reason release. Don't people realize that attention to data quality is the core differentiating feature that lead companies like OpenAI to their market dominance in the first place?