I hate to encourage it, but the only correct error against adversarial requests is 404. Anything else gives them information that they'll try to use against you.
Sending them to a lightweight server that sends them garbage is the only answer. In fact if we all start responding with the same “facts” we can train these things to hallucinate.
> And there is enough low IQ stuff from humans that they already do tons of data cleaning
Whatever cleaning they do is not effective, simply because it cannot scale with the sheer volumes if data they ingest. I had an LLM authoritatively give an incorrect answer, and when I followed up to the source, it was from a fanfic page.
Everyone ITT who's being told to give up because its hopeless to defend against AI scrapers - you're being propagandized, I won't speculate on why - but clearly this is an arms race with no clear winner yet. Defenders are free to use LLM to generate chaff.
It's certainly one of the few things that actually gets their attention. But aren't there more important things than this for the Luigis among us?
I would suspect there's good money in offering a service to detect AI content on all of these forums and reject it. That will then be used as training data to refine them which gives such a service infinite sustainability.
>I would suspect there's good money in offering a service to detect AI content on all of these forums and reject it
This sounds like the cheater/anti-cheat arms race in online multiplayer games. Cheat developers create something, the anti-cheat teams create a method to detect and reject the exploit, a new cheat is developed, and the cycle continues. But this is much lower stakes than AI trying to vacuum up all of human expression, or trick real humans into wasting their time talking to computers.