Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I see it a bit differently, many (most?) web sites explicitly deny scraping execept for Google. Further Google has the infrastructure to crawl several trillion web pages and create a relevant index out of the most authoritative 1.5 trillion. To re-create that on your own, you would need both the web to allow it, and the infrastructure to do it. I would agree that this isn't an insurmountable moat but it is a good one.


Most websites only explicitly deny scraping by bad bots (robots.txt). Things like Cloudflare are a completely different matter, and I have a whole batch of opinions about how they are destroying the web.

I'd love to compete directly with OpenAI, but the cost of a half million GPUs is a me problem - not a them problem. Google can't be faulted for figuring out how to crawl the web in an economically viable way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: