My question is, can we serve PoW challenges to these AI LLM scrapers that can be...

xena · 2025-03-20T13:18:22 1742476702

That's what I've been doing! It works shockingly well. https://github.com/TecharoHQ/anubis

KolmogorovComp · 2025-03-20T13:20:55 1742476855

Unless I am missing something, the result of that generated work has no monetary value though.

xena · 2025-03-20T13:40:40 1742478040

I was inspired by https://en.wikipedia.org/wiki/Hashcash, which was proof of work for email to disincentivize spam. To my horror, it worked sufficiently for my git server so I released it as open source. It's now its own project and protects big sites like GNOME's GitLab.

01HNNWZ0MV43FF · 2025-03-20T13:48:57 1742478537

That's cool! What if instead of sha256 you used one of those memory-hard functions like script? Or is sha needed because it has a native impl in browsers?

xena · 2025-03-20T13:57:03 1742479023

Right now I'm using SHA-256 because this project was originally written as a vibe sesh rage against the machine. The second reason is that the combination of Chrome/Firefox/Safari's JIT and webcrypto being native C++ is probably faster than what I could write myself. Amusingly, supporting this means it works on very old/anemic PCs like PowerMac G5 (which doesn't support WebAssembly because it's big-endian).

I'm gonna do experiments with xeiaso.net as the main testing ground.

ziddoap · 2025-03-20T13:29:05 1742477345

The monetary value is not having a misbehaving AI bot download 73TB or whatever of your data.

cess11 · 2025-03-20T13:55:49 1742478949

Interesting idea. Seems to me it might be possible to use with a Monero mining challenge instead, for those low real traffic applications where most of the requests are sure to be bots.

jsheard · 2025-03-20T13:21:31 1742476891

I'm curious if the PoW component is really necessary, AIUI untargeted crawlers are usually curl wrappers which don't run Javascript, so requiring even a trivial amount of JS would defeat them. Unless AI companies are so flush with cash that they can afford to just use headless Chrome for everything, efficiency be damned.

xena · 2025-03-20T13:38:49 1742477929

Sadly, in testing the proof of work is needed. The scrapers run JS because if you don't run JS the modern web is broken. Anubis is tactically designed to make them use modern versions of Firefox/Chrome at least.

They really do use headless chrome for everything. My testing has shown a lot of them are on Digital Ocean. I have a list of IP addresses in case someone from there is reading this and can have a come to jesus conversation with those AI companies.

blibble · 2025-03-20T13:32:09 1742477529

these companies have more compute than everyone else in the world put together

a proof of work function will end up selecting FOR them!

anthk · 2025-03-20T13:42:37 1742478157

Use judo techniques. Use their own computing power against them with fake links to fake Markov generated bullshit at random, until their cache get poisoned with no turning point as it's impossible; the LLM's begin to either forget their own stuff or hallucinate once their input it's basically feeded from other LLM's (or themselves).

01HNNWZ0MV43FF · 2025-03-20T13:46:26 1742478386

It'll still keep your site from getting hammered

blibble · 2025-03-20T14:25:02 1742480702

until some drone working for the parasites (google/facebook/openai) sees this post and writes 5 lines of code to defeat it

and now you have an experience where the bots have it easier time accessing your content than legitimate visitors

GTP · 2025-03-20T15:05:03 1742483103

How would those 5 lines of code look like? The base of this solution is that it asks to solve a computationally-intensive problem whose solution, once provided, isn't computationally-intensive to check. How would those 5 lines of code change this?

blibble · 2025-03-20T15:06:42 1742483202

nice try, Google employee

GTP · 2025-03-20T15:17:45 1742483865

Lol, such a childish excuse to not answer.

rapnie · 2025-03-20T13:53:43 1742478823

I know that mCaptcha is based on PoW. It may be usable here.

https://mcaptcha.org/

rswail · 2025-03-20T13:16:03 1742476563

Finally, a reason for bitcoins!