Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, any crawler or some traffic peak can do this if your infrastructure is not well engineered


You're getting it the wrong way. It's: Any crawler that's not well engineered, that doesn't follow robots.txt, that fakes its User Agent, that doesn't allow you to contact hem, that fetches content an indiscriminate number of times, repeatedly, all day long... can do this to your infrastructure unless you're a giant.

What these crawlers are doing is akin to DDoS attacks.


Please do explain how you'd engineer a site to deal with barrage of poorly written scrapers descending upon it. After you've done geo-ip routing, implemented various levels of caching, separated read/write traffic and bought an ever increasing amount of bandwidth, what is there left to do?

You could also get CloudFlare, or some other CDN, but depending on your size that might not be within your budget. I don't get why the rest of the internet should subsidize these AI companies. They're not profitable and live of venture capital and increase the operation costs of everyone else.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: