Google has made it very difficult to completely block their AI crawling by also ...

fathomdeez · 2025-07-08T16:25:01 1751991901

It doesn't seem like the extra traffic is the issue. People don't want Google's AI from reading and summarizing their data and thus preventing clickthroughs. Why would I click on your site if google did all the work of giving me the answer ahead of time?

dathinab · 2025-07-08T16:30:21 1751992221

> It doesn't seem like the extra traffic is the issue.

it really can be, Anubis AI crawler detection was create mainly because of "way to many AI bot requests" to quote

> This program is designed to help protect the small internet from the endless storm of requests that flood in from AI companies.

imilk · 2025-07-08T16:38:18 1751992698

Both are an issue. People don't want AI overviews cannibalizing their website traffic. People also don't want AI bots spamming their website with outrageous numbers of requests everyday.

Spivak · 2025-07-08T16:18:36 1751991516

In the specific case of Google would there be any additional traffic that isn't just the normal googlebot? I can't imagine they would bother crawling twice for every site on the internet.

imilk · 2025-07-08T16:28:40 1751992120

There are about a dozen Google crawlers that can hit your website for different reasons:

https://developers.google.com/search/docs/crawling-indexing/...

Google-Extended is what is associated with AI crawling, but GoogleBot also crawls to produce AI overviews in addition to indexing your website in Google search.

While the number of crawlers and their overlapping responsibilities makes it difficult to know which ones you can safely block, I should also say that pure AI company bots behave 1000x worse than Google crawlers when it comes to not flooding your site with scraping requests.

dathinab · 2025-07-08T16:27:12 1751992032

which again comes back to

this is a problem which needs regulatory action, not one which should be solved by a quasi monopoly forcing it onto anyone but another quasi monopoly which can use their monopoly power to avoid it

require

- respecting robots.txt and similar

- require purpose binding/separation (of the crawler agent, but also the retrieved data) similar to what GDPR does

- require public agent purpose documentation and stable agent identities

- disallow obfuscation of who is crawling what

- do enforce it

and sure making something illegal doesn't prevent anyone from being technically able to do it

but now at lest large companies like Google have to decide weather they want to commit a crime, and the more they obfuscate that they are doing it the more there is prove it was done with a lot of bad faith, i.e. the higher judges can push punitive damages

combine it with internet gateways like CF trying to provide technical enforcement and you might have a good solution

but one quasi monopoly trying to force another to "comply" with their money making scheme (even if it's in the interest of the end user) smells a lot like you can have a winnable case against CF wrt. unfair market practices, monopoly power abuse etc...

imilk · 2025-07-08T16:32:21 1751992341

I find it wild that you focus on CF being a monopoly here when they are providing tools that help publishers not have all of their content stolen and repurposed. AI companies have been notorious over the last few years for not respecting any directives and spamming sites with requests to scrape all of their data.

There is also nothing stopping other CDN/DNS providers spinning up a similar marketplace to what CF is looking to do now.

bediger4000 · 2025-07-08T17:05:50 1751994350

> this is a problem which needs regulatory action

I thought we were broadly opposed to regulatory action for a number of reasons, including anti-socialism ideology, dislike of "red tape", and belief that free markets can solve problems.