Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

IMO this is an underappreciated advantage for Google. Nobody wants to block the GoogleBot, so they can continue to scrape for AI data long after AI-specific companies get blocked.

Gemini is currently embarrassingly bad given it came from the shop that:

1. invented the Transformer architecture

2. has (one of) the largest compute clusters on the planet

3. can scrape every website thanks to a long-standing whitelist




The new Gemini Experimental models are the best general purpose models out right now. I have been comparing with o1 Pro and I prefer Gemini Experimental 1206 due to its context, speed, and accuracy. Google came out with a lot of new stuff last week if you havent been following. They seem to have the best models across the board, including image and video.


Omnimodal and code/writing output still has a ways to go for Gemini - I have been following and their benchmarks are not impressive compared to the competition, let alone my anecdotal experience in using Claude for coding, GPT for spec-writing, and Gemini for... Occasional cautious optimism to see if it can replace either.


> Nobody wants to block the GoogleBot

This only remains true as long as website operators think that Google Search is useful as a driver of traffic. In tech circles Google Search is already considered a flaming dumpster heap, so let's take bets on when that sentiment percolates out into the mainstream.


If it reaches the point where google is no longer a useful driver of traffic then there's probably little point in having a website at all any more.


Strange take ... I seem to remember websites having a lot of point before google.


They had a point back then because no alternatives existed.

How many websites back then would be youtube channels, podcasts or social media accounts if they had existed back then?

Nowadays most sites survive via traffic from google, if it goes away then most of those sites go away as well.


They had a lot or point because....

1. They were a major site that was an initial starting point for traffic

2. Search engines pointed to them and people could locate them.

---

That was all a long time ago. Now people tend to go to a few 'all in one sites'. Google, reddit, '$big social media'. Other than Google most of those places optimize you to stay on that particular site rather than go to other people's content. The 'web' was a web of interconnectedness. Now it's more like a singularity. Once you pass the event horizon of their domain you can never escape again.


For OpenAI, they could lean on their relationship with Microsoft for Bing crawler access

Websites won’t be blocking the search engine crawlers until they stop sending back traffic, even if they’re sending back less and less traffic


Wonder if OpenAI is considering building a search engine for this reason... Imagine if we get a functional search engine again from some company just trying to feeding their model generation...


There are two to distinguish: "Googlebot" and "Google-Extended".


That seems to be more like a courtesy that Google could stop extending at any point than a requirement grounded in law or legal precedent.


Same goes for OpenAI ignoring these "blocks".




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: