The big takeaway here is that Google's (and advertisement in general) dominance ...

djha-skin · 2025-03-20T13:35:43 1742477743

I actually envision Liapunov stability, like wolf and rabbit populations. In this scenario, we're the rabbits. Human content will increase when AI populations decrease, this providing more food for AI, which will then increase. This drowns out human expression, and the humans will grow quieter. This provides less fodder for the AI, and they decrease. This means less noise and the humans grow louder. The cycle repeats and nauseam.

GolfPopper · 2025-03-20T14:17:46 1742480266

Until broken by the Butlerian Jihad, "Though shalt not make a machine in the likeness of the mind of man."

keyringlight · 2025-03-20T14:26:27 1742480787

I've thought along similar lines for art, what ecological niches are there where AI can't participate, are harder to pull training data from or not economical, where humans can flourish.

bashfulpup · 2025-03-20T15:21:30 1742484090

Anything we humans deem private in nature from other humans.

Y_Y · 2025-03-20T18:03:34 1742493814

See e.g. https://en.wikipedia.org/wiki/Lotka%E2%80%93Volterra_equatio...

InfamousRece · 2025-03-20T18:42:14 1742496134

If the logistic driving parameter is large enough it can also lead to complete chaos.

cle · 2025-03-20T13:44:47 1742478287

IMO this was one of the real motives for Web Environment Integrity. Allow Google to index but nobody else.

We're kind of stuck between a rock and a hard place here. Which do you prefer, entrenched incumbents or affordable/open hosting?

lelandfe · 2025-03-20T13:54:47 1742478887

I’m supremely confident that attestation will arrive in one form or another in the near future.

Anonymous browsing and potentially-malicious bots look identical. This was sort of OK up until now.

cle · 2025-03-20T13:57:08 1742479028

Agreed, it seems inevitable. Unfortunately I think it will also result in further centralization & consolidation into a handful of "trusted" megacorps.

If you thought browser fingerprinting for ad tracking was creepy, just wait until they're using your actual fingerprint.

natebc · 2025-03-20T18:22:36 1742494956

does indeed sound like we're headed right back to AOL. At least this time it'll be faster? Certainly won't be as charming.

breckenedge · 2025-03-20T13:35:24 1742477724

Google is already scraping your site and presenting answers directly in search results. If I cared about traffic (hence selling ad space), why would I want my site indexed by Google at all anymore? Lots of advertising-supported sites are going to go dark because only bots will visit them.

sgc · 2025-03-20T13:45:21 1742478321

It will entrench established search engines even more if they have to move to auth-based crawling, so that the only crawlers will be those you invite. Most people will do this for google, bing, and maybe one or two others if there is a simple tool to do so.

nicce · 2025-03-20T13:32:24 1742477544

> The big takeaway here is that Google's (and advertisement in general) dominance over the web is going away.

AI companies with best anti-captcha mechanics will win and will inject ads to LLM output in more sophisticated way.

renegat0x0 · 2025-03-20T13:58:53 1742479133

This cannot be further from the truth. Ad business is not going anywhere. It will grow even bigger.

OpenAI goes through initial cycle of enshittification. Google is too big right now. Once they establish dominance you will have to see 5 unskippable ads between prompts, even for paid plan.

I solved user problems for myself. Most of my web projects use client side processing. I moved to github pages. So clients can use my projects with no down time. Pages use SQLite as source of data. First browser downloads the SQLite model, then it uses it to display data on client side.

Example 'search' project: https://rumca-js.github.io/search

nicce · 2025-03-20T15:15:28 1742483728

The stated problem was about indexing, accessing content and advertising in that context.

> I solved user problems for myself. Most of my web projects use client side processing. I moved to github pages. So clients can use my projects with no down time. Pages use SQLite as source of data. First browser downloads the SQLite model, then it uses it to display data on client side.

> Example 'search' project: https://rumca-js.github.io/search

That is not really solution. Since typical indexing still works for masses, your approach is currently unique. But in the end, bots will be capable of reading on web page context if human is capable on reading them. And we get back to the original problem where we try to detect bots from humans. It's the only way.

ethan_smith · 2025-03-20T17:38:47 1742492327

What about the next-gen of AI that would be able to signup autonomously? Even if implemented auth-walls everywhere right now, whats stopping the companies to get some real cheap labor to create accounts on websites and use them to scrape the content?

Is it going to become another race like the adblocker -> detect adblocker -> bypass adblocker detector and so on...?

sir-alien · 2025-03-20T13:46:20 1742478380

Can we not just have a whitelist for allowed crawlers and ban the rest by default? Then places like DuckDuckGo and Google can provide a list of IP addresses that their crawlers will come from. Then simply just don't include major LLM providers like OpenAI

danieldk · 2025-03-20T14:36:23 1742481383

How do you distinguish crawlers from regular visitors using a whitelist? As stated in the article, the crawlers show up with seemingly unique IP addresses and seemingly real user agents. It's a cat and mouse game.

Only if you operate on the scale of Cloudflare, etc. you can see which IP addresses are hitting a large number of servers in a short time span.

(I am pretty sure next they will hand out N free LLM requests per month in exchange of user machines doing the scraping if blocking gets more succesful.)

I fear the only solution in the end are CDNs, making visits expensive using challenges, or requiring users to log in.

regularfry · 2025-03-20T13:57:56 1742479076

How are the crawlers identifying themselves? If it's user agent strings then they can be faked. If it's cryptographically secured then you create a situation where newcomers can't get into the market.

what · 2025-03-20T14:37:40 1742481460

Google publishes the ip addresses that google bot uses. If someone claims to be google bot but is not from one of those addresses, it’s a fake.

regularfry · 2025-03-20T15:12:31 1742483551

And in that case both systems end up with a situation new entrants can't enter.

usefulcat · 2025-03-20T15:19:56 1742483996

I don't see how that helps the case where the UA looks like a normal browser and the source IP looks residential.

lacksconfidence · 2025-03-20T15:06:40 1742483200

How about if they claim to be google chrome running on windows 11, from a residential IP address? Is that a human or an AI bot?

KTibow · 2025-03-20T14:34:02 1742481242

We actually can do this already.

https://duckduckgo.com/duckduckgo-help-pages/results/duckduc...

https://developers.google.com/search/docs/crawling-indexing/...

https://www.bing.com/webmasters/help/how-to-verify-bingbot-3...

prmoustache · 2025-03-20T20:35:23 1742502923

I am pretty sure a number of crawlers are running inside mobile apps of mobile phone users so they can get residential ip pools.

ATechGuy · 2025-03-20T21:57:27 1742507847

This is scary!

Thorrez · 2025-03-20T14:36:23 1742481383

The problem is many crawlers pretend to be humans. So to ban the rest of the crawlers by default, you'll have to ban humans.

nonrandomstring · 2025-03-20T14:15:11 1742480111

This sort of positive security model with behavioural analysis is the future. We need to get it built-in to Apache,Nginx,Caddy etc. The trick is spotting crawlers from users. It can be done though.

insane_dreamer · 2025-03-20T14:31:09 1742481069

Or an open list of IPs that are identified as AI companies that is updated regularly and firewalls can be easily updated with? (Same idea as open source AV)

lelanthran · 2025-03-22T07:40:59 1742629259

> Or an open list of IPs that are identified as AI companies that is updated regularly and firewalls can be easily updated with? (Same idea as open source AV)

I don't really know about this proposal; the majority of bots are going to be coming from residential IPs the minute you do this.[1]

[1] The AI SaaS will simply run a background worker on the client to do their search indexing.

__MatrixMan__ · 2025-03-20T15:25:00 1742484300

You can have a whitelist for allowed users and ban everyone else by default, which I think is where this will eventually take us.

chr1 · 2025-03-20T13:41:06 1742478066

AI is good at solving captchas. But even if everyone added a captcha search engines will continue indexing. Because it is easy to add authentication for search engines to escape captcha, Google will just need to publish a public key.

orthecreedence · 2025-03-20T15:57:05 1742486225

This is fine, as Google's utility as a search engine has turned into a hot pile of garbage, at least for my cases. Where a decade ago I could put in a few keywords and get relevant results, I now have to guide it with several "quoted phrases" and -exclusions to get the result I'm looking for on the second or third result page. It has crumbled under its own weight, and seems to suggest irrelevant trash to me first and foremost because it's the website of some big player or content farm. Either their algorithm is tuned for mass manipulation or they lost the arms race with SEO cretins (or both).

Granted, I'm not looking forward to some LLM condensing all the garbage and handing me a Definitive Answer (TM) based on the information it deems relevant for inclusion.