Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I run a Mediawiki instance for an online community on a fairly cheap box (not a ton of traffic) but had a few instances of AI bots like Amazon's crawling a lot of expensive API pages thousands of times an hour (despite robots.txt preventing those). Turned on Cloudflare's bot blocking and 50% of total traffic instantly went away. Even now, blocked bot requests make up 25% of total requests to the site. Without blocking I would have needed to upgrade quite a bit or play a tiring game of whack a mole blocking any new IP ranges for the dozens of bots.


AI bots are a huge issue for a lot of sites. Just putting intentional DDoS attacks aside, AI scrapers can frequently tip over a site because many of them don't know how to back off. Google is an exception really, their experience with creating GoogleBot as ensured that they are never a problem.

Many of the AI scrapers don't identify themselves, they live on AWS, Azure, Alibaba Cloud, and Tencent Cloud, so you can't really block them and rate limiting also have limited effect as they just jump to new IPs. As a site owner, you can't really contact AWS and ask them to terminate their customers service in order for you to recover.


How do you feel, knowing that some portion of the 25% “detected bot traffic” are actually people in this comment thread?


You don't need buttflare's mistery juice to rate-limit or block bad users.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: