AI will create ever more AI-generated synthetic content because current systems still can't determine with 100% certainty whether a piece of content was produced by AI. And AIs will, intentionally or unintentionally, train on synthetic content produced by other AIs.
AI generators don't have a strong incentive to add watermarks to synthetic content. They also don't provide reliable AI-detection tools (or any tools at all) to help others detect content generated by them.
Maybe some of them already embed some simple, secret marker to identify their own generated content. But people outside the organization wouldn’t know. And this still can’t prevent other companies from training models on synthetic data.
Once synthetic data becomes pervasive, it’s inevitable that some of it will end up in the training process. Then it’ll be interesting to see how the information world evolves: AI-generated content built on synthetic data produced by other AIs. Over time, people may trust AI-generated content less and less.
AI generators don't have a strong incentive to add watermarks to synthetic content. They also don't provide reliable AI-detection tools (or any tools at all) to help others detect content generated by them.