The problem with this is in the vein of `Requires immediate total cooperation fr...

xp84 · 2025-05-10T18:02:36 1746900156

Wouldn’t a decent solution, if some action happened where Google was divesting the crawler stuff, be to just do like browser user agents have always done (in that case multiple times to comical degrees)? Something like ‘Googlebot/3.1 (successor, CommonCrawl 1.0)’

toomuchtodo · 2025-05-10T18:35:54 1746902154

Lots of good replies to your comment already. I'd also offer up Cloudflare offering the option to crawl customer origins, with them shipping the compressed archives off to Common Crawl for storage. This gives site admins and owners control over the crawling, and reduces unnecessary load as someone like Cloudflare can manage the crawler worker queue and network shipping internally.

(Cloudflare customer, no other affiliation)

kzrdude · 2025-05-10T19:23:47 1746905027

That says that if google switches over to ccbot then the rest will follow.

CPLX · 2025-05-10T18:00:23 1746900023

I mean if it’s created as part of setting the global rules for the internet you could just make it opt out.