Hacker News new | past | comments | ask | show | jobs | submit login

My question is, can we serve PoW challenges to these AI LLM scrapers that can be profitable?



That's what I've been doing! It works shockingly well. https://github.com/TecharoHQ/anubis


Unless I am missing something, the result of that generated work has no monetary value though.


I was inspired by https://en.wikipedia.org/wiki/Hashcash, which was proof of work for email to disincentivize spam. To my horror, it worked sufficiently for my git server so I released it as open source. It's now its own project and protects big sites like GNOME's GitLab.


That's cool! What if instead of sha256 you used one of those memory-hard functions like script? Or is sha needed because it has a native impl in browsers?


Right now I'm using SHA-256 because this project was originally written as a vibe sesh rage against the machine. The second reason is that the combination of Chrome/Firefox/Safari's JIT and webcrypto being native C++ is probably faster than what I could write myself. Amusingly, supporting this means it works on very old/anemic PCs like PowerMac G5 (which doesn't support WebAssembly because it's big-endian).

I'm gonna do experiments with xeiaso.net as the main testing ground.


The monetary value is not having a misbehaving AI bot download 73TB or whatever of your data.


Interesting idea. Seems to me it might be possible to use with a Monero mining challenge instead, for those low real traffic applications where most of the requests are sure to be bots.


I'm curious if the PoW component is really necessary, AIUI untargeted crawlers are usually curl wrappers which don't run Javascript, so requiring even a trivial amount of JS would defeat them. Unless AI companies are so flush with cash that they can afford to just use headless Chrome for everything, efficiency be damned.


Sadly, in testing the proof of work is needed. The scrapers run JS because if you don't run JS the modern web is broken. Anubis is tactically designed to make them use modern versions of Firefox/Chrome at least.

They really do use headless chrome for everything. My testing has shown a lot of them are on Digital Ocean. I have a list of IP addresses in case someone from there is reading this and can have a come to jesus conversation with those AI companies.


these companies have more compute than everyone else in the world put together

a proof of work function will end up selecting FOR them!


Use judo techniques. Use their own computing power against them with fake links to fake Markov generated bullshit at random, until their cache get poisoned with no turning point as it's impossible; the LLM's begin to either forget their own stuff or hallucinate once their input it's basically feeded from other LLM's (or themselves).


It'll still keep your site from getting hammered


until some drone working for the parasites (google/facebook/openai) sees this post and writes 5 lines of code to defeat it

and now you have an experience where the bots have it easier time accessing your content than legitimate visitors


How would those 5 lines of code look like? The base of this solution is that it asks to solve a computationally-intensive problem whose solution, once provided, isn't computationally-intensive to check. How would those 5 lines of code change this?


nice try, Google employee


Lol, such a childish excuse to not answer.


I know that mCaptcha is based on PoW. It may be usable here.

https://mcaptcha.org/


Finally, a reason for bitcoins!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: