> One crawler downloaded 73 TB of zipped HTML files in May 2024 [...] This cost ...

Ray20 · 2025-03-20T17:13:58 1742490838

I think uncontrolled price of cloud traffic - is a real fraud and way bigger problem then some AI companies that ignore robot.txt. One time we went over limit on Netlify or something, and they charged over thousand for a couple TB.

joepie91_ · 2025-03-21T18:14:01 1742580841

> I think uncontrolled price of cloud traffic - is a real fraud

Yes, it is.

> and way bigger problem then some AI companies that ignore robot.txt.

No, it absolutely is not. I think you underestimate just how hard these AI companies hammer services - it is bringing down systems that have weathered significant past traffic spikes with no issues, and the traffic volumes are at the level where literally any other kind of company would've been banned by their upstream for "carrying out DDoS attacks" months ago.

Ray20 · 2025-03-21T22:30:35 1742596235

>I think you underestimate just how hard these AI companies hammer services

Yeas, I completely don't understand this and don't understand comparing this with ddos attacks. There's no difference with what search engines are doing, and in some way it's worse? How? It's simply scraping data, what significant problems may it cause? Cache pollution? And thats'it? I mean even when we talking about ignoring robots.txt (which search engines are often doing too) and calling costly endpoints - what is the problem to add to those endpoints some captcha or rate limiters?