> I doubt many people are doing things to allow Googlebot but also ban other search crawlers.
Sadly this is just not the case.[1][2] Google knows this too so they explicitly crawl from a specific IP range that they publish.[3]
I also know this, because I had a website that blocked any bots outside of that IP range. We had honeypot links (hidden to humans via CSS) that insta-banned any user or bot that clicked/fetched them. User-Agent from curl, wget, or any HTTP lib = insta-ban. Crawling links sequentially across multiple IPs = all banned. Any signal we found that indicated you were not a human using a web browser = ban.
We were listed on Google and never had traffic issues.
Sadly this is just not the case.[1][2] Google knows this too so they explicitly crawl from a specific IP range that they publish.[3]
I also know this, because I had a website that blocked any bots outside of that IP range. We had honeypot links (hidden to humans via CSS) that insta-banned any user or bot that clicked/fetched them. User-Agent from curl, wget, or any HTTP lib = insta-ban. Crawling links sequentially across multiple IPs = all banned. Any signal we found that indicated you were not a human using a web browser = ban.
We were listed on Google and never had traffic issues.
[1] https://onescales.com/blogs/main/the-bot-blocklist
[2] Chart in the middle of this page: https://blog.cloudflare.com/declaring-your-aindependence-blo... (note: Google-Extended != Googlebot)
[3] https://developers.google.com/search/docs/crawling-indexing/...