Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Crawling the internet is a natural monopoly. Nobody wants an endless stream of bots crawling their site,

Companies want traffic from any source they can get. They welcome every search engine crawler that comes along because every little exposure translates to incremental chances at revenue or growing audience.

I doubt many people are doing things to allow Googlebot but also ban other search crawlers.

> My heart just wants Google to burn to the ground

I think there’s a lot of that in this thread and it’s opening the door to some mental gymnastics like the above claim about Google being the only crawler allowed to index the internet.



> I doubt many people are doing things to allow Googlebot but also ban other search crawlers.

Sadly this is just not the case.[1][2] Google knows this too so they explicitly crawl from a specific IP range that they publish.[3]

I also know this, because I had a website that blocked any bots outside of that IP range. We had honeypot links (hidden to humans via CSS) that insta-banned any user or bot that clicked/fetched them. User-Agent from curl, wget, or any HTTP lib = insta-ban. Crawling links sequentially across multiple IPs = all banned. Any signal we found that indicated you were not a human using a web browser = ban.

We were listed on Google and never had traffic issues.

[1] https://onescales.com/blogs/main/the-bot-blocklist

[2] Chart in the middle of this page: https://blog.cloudflare.com/declaring-your-aindependence-blo... (note: Google-Extended != Googlebot)

[3] https://developers.google.com/search/docs/crawling-indexing/...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: