I think there was some sort of fake webserver that did something like this already. Basically just linked endlessly to more llm-generated pages of nonsense.
There are lots more as well; those are just a few of the ones that recently made the rounds.
I suspect that combining approaches will be a tractable way to waste time:
- Anubis-esque systems to defeat or delay easily-deterred or cut-rate crawlers,
- CloudFlare or similar for more invasive-to-real-humans crawler deterrence (perhaps only served to a fraction of traffic or traffic that crosses a suspicion threshold?),
- Junk content rings like Nepenthes as honeypots or "A/B tests" for whether a particular traffic type is an AI or not (if it keeps following nonsense-content links endlessly, it's not a human; if it gives up pretty quickly, it might be--this costs/pisses off users but can be used as a test to better train traffic-analysis rules that trigger the other approaches on this list in response to detected likely-crawler traffic).
- Model poisoners out of sheer pettiness, if it brings you joy.
I also wonder if serving taboo traffic (e.g. legal but beyond-the-pale for most commercial applications porn/erotica) would deter some AI crawlers. There might be front-side content filters that either blacklist or de-prioritize sites whose main content appears (to the crawler) to be at some intersection of inappropriate, prohibited, and not widely-enough related to model output as to be in demand.