Fava is great. My only problem is that I was lazy and haven't imported anything for at least 2 years, and now it feels too daunting to try and catch up.
It's not really a DDoS guard though. If someone wants to ddos a server, anubis isn't going to be able to stop the traffic before it gets to the server.
It does help from accidental ddos or just rude scrapers that assume everyone has unlimited bandwidth and money.
We’ve evolved. The only thing that pays attention to the OSI model is DDoS which can hit you on every layer. And against the application layer attacks Anubis and friends are effective.
One of my semi-personal websites gets crawled by AI crawlers a ton now. I use Bunny.net for a cdn. $20 used to last me for months of traffic, now it only lasts a week or two at most. It's enough that I'm going to go back to not using a cdn and just let the site suffer some slowness every once in a while.
I think there was some sort of fake webserver that did something like this already. Basically just linked endlessly to more llm-generated pages of nonsense.
There are lots more as well; those are just a few of the ones that recently made the rounds.
I suspect that combining approaches will be a tractable way to waste time:
- Anubis-esque systems to defeat or delay easily-deterred or cut-rate crawlers,
- CloudFlare or similar for more invasive-to-real-humans crawler deterrence (perhaps only served to a fraction of traffic or traffic that crosses a suspicion threshold?),
- Junk content rings like Nepenthes as honeypots or "A/B tests" for whether a particular traffic type is an AI or not (if it keeps following nonsense-content links endlessly, it's not a human; if it gives up pretty quickly, it might be--this costs/pisses off users but can be used as a test to better train traffic-analysis rules that trigger the other approaches on this list in response to detected likely-crawler traffic).
- Model poisoners out of sheer pettiness, if it brings you joy.
I also wonder if serving taboo traffic (e.g. legal but beyond-the-pale for most commercial applications porn/erotica) would deter some AI crawlers. There might be front-side content filters that either blacklist or de-prioritize sites whose main content appears (to the crawler) to be at some intersection of inappropriate, prohibited, and not widely-enough related to model output as to be in demand.
Try mosh and see if it helps you with input lag issues. Iirc it processes or buffers the input locally instead of waiting for the server to respond, so it feels faster.
reply