I wonder if AI generated web pages block AI agents from indexing their content, like, one engine indexing the other's content in a loop until the amount of digital garbage is so gigantic that is the end of the information era, or how are we ever stopping this? What is our failsafe?
Is there an approximation/ratio in which the amount of digital garbage/hallucinations online generated by AI is so big that it cannot be used to train AI itself? Like are AI companies running against the clock because, say, in 5 years the internet will be flooded by false information to such an extent that it would render the internet as an invalid training ground. In a way requiring a snapshot of the internet pre-AI, because this is click bait problem times infinity it feels like
It's too late already if you want to just scrape random horseshit on the internet. There will be real money in large expert generated data sets. AI is also a potential epistemology nightmare. It can cement bad knowledge and bury new more up to date knowledge in a sea of bullshit.