> What incentive would Google have to continue populating that index? Presumably...

lopmotr · on Dec 25, 2020

Do any site operators actually block non-Google search engine crawlers because being listed DDG/Bing/etc isn't worth the extra cost of serving the crawler? It sound a bit ridiculous unless they actually don't want to be found. Maybe they only allow GoogleBot because that's all they thought of and the extra cost is in researching what all the other search engines call theirs.

Perhaps other search engines should spoof GoogleBot. Browsers have being doing that since forever spoofing Netscape (Mozilla), Safari, etc. for the same reason.

> Why don't we have Google share the results and we can use that money to do more productive things than recreating that work?

This sounds like a common fallacy of people criticizing the free market. Duplicated effort looks wasteful but turns out to be far more productive than the lack of incentive that comes with not being able to profit from your work/investment.

knuckleheads · on Dec 25, 2020

> Do any site operators actually block non-Google search engine crawlers because being listed DDG/Bing/etc isn't worth the extra cost of serving the crawler?

Many website operators do actually block crawlers from non Google search engines and it's because the cost of being crawled isn't worth it to them. Here's a good quote from one such webmaster:

    As a webmaster I get a bit tired of constantly having to deal with the startup crawler du jour.

    From law firms looking for DMCA violations to verticals search engines, to image aggregators, to company intelligence resellers… It feels to me that everybody and their brother has gotten into spidering sites.

    With 10,000s of pages that have content that is only relevant to a targeted audience who is perfectly able to find us on the majors, I do not hesitate to block (and possibly ban) when I see an aggressive crawler that does not provide me or my customers with direct benefits.

Taken from http://www.skrenta.com/2008/04/cuill_is_banned_on_10000_site...

> Perhaps other search engines should spoof GoogleBot. Browsers have being doing that since forever spoofing Netscape (Mozilla), Safari, etc. for the same reason.

People have tried this and it doesn't work. Google provides ways to check to make sure traffic is coming from Google IP addresses and practitioners and academics study how to spot fake Googlebots. https://developers.google.com/search/docs/advanced/crawling/... https://blogs.akamai.com/2014/07/search-engine-impersonation... https://ieeexplore.ieee.org/document/8421894

> This sounds like a common fallacy of people criticizing the free market.

I am asserting that crawling the web is a natural monopoly. This means that the free market has failed and that it is not possible for the market to heal itself in this regard. There is significant evidence that this is the case and I imagine you'll be hearing more and more about it soon.

Someone · on Dec 25, 2020

I would think the site owner’s cost of being indexed is the same for every search engine that indexes the site.

The benefit varies with the quality of the search engines, and that will vary between search engines, but it does get larger the more a search engine is used, so a cost/benefits analysis may show Google and a few other large ones are the only ones worth supporting.

knuckleheads · on Dec 25, 2020

Yes! Exactly!

loeg · on Dec 25, 2020

Yes, site operators actually block non-Googlebot crawlers. See the example [0] from https://news.ycombinator.com/item?id=25538842 .

Spoofing crawler identity completely defeats the point of the honor-system robots.txt.