> Let's hear some real solutions. It's by no means a full solution (there likely...

tjoff · on Jan 22, 2020

> Part of the reason why everyone is trying to detect bots is because bots will very, very rapidly eat up your bandwidth and CPU time.

It is?

Thought bot detection was only done during registration etc. to stop them from sending spam etc. to real users.

If anything the javascript world we live in helps combat this. You need insane resources on the client just to have a page open. Several orders of magnitude more than the server need to generate and send that page.

AstralStorm · on Jan 22, 2020

In that case, an IP or IP block throttling is good enough.

Except then there are those pesky CGNATs to handle including Chinese Great Wall.

Anyway, high profile spammers will emulate enough of the browser to render any measure based on browser anomaly detection worthless. Including using a headless browser. The only way to defeat them would be too put some quite computationally intensive JS operation... (On par with mining, ruining all the laptops, phones and tablets. But you can make it not trigger every time.) This would make spamming expensive.

Server-side we have excellent AI spam filters that nobody seems to be using to fire off a captcha check later. The big problem here is that you cannot offload to some provider without inviting big privacy concerns. (Same problem as forum/chat/discussion platform providers.)

massaman_yams · on Jan 22, 2020

No. Botnets are large and broadly distributed enough to render protection methods based only on the IP or IP block ineffective. They're commonly used for mailbombing attacks such as those described here: https://www.wired.com/story/how-journalists-fought-back-agai...

Do you think a botnet with 10k machines is going to be meaningfully inhibited by making each machine's cpu run calculations for a second or two for each submission?

I'm sure reCAPTCHA looks at the IP and IP block as one of the inputs to its ML algorithm, but as one or two of perhaps a dozen different features - including mouse movement and/or keyboard input, which is quite a bit harder to fake.

thu2111 · on Jan 22, 2020

high profile spammers will emulate enough of the browser to render any measure based on browser anomaly detection worthless

Based on actual experience of fighting spammers, that isn't the case. Like a lot of people new to spam fighting you're making assumptions about the adversaries that aren't valid.

bradknowles · on Jan 23, 2020

There are many different types of spammers and attackers.

Some will be stopped by the simplest protection mechanisms.

Some will be indistinguishable from real humans, and you won’t be able to stop them without crippling your services for your real users.

But those are the two extremes. The real problem is the ones between those extremes.

Every intentional stumbling block you put in the path to try and stop those in the middle might also have a negative impact on your real users. The real problem is that the most troublesome attackers will learn and adapt to whatever stumbling blocks you put in the path. So, how many of your own toes are you willing to sacrifice with your foot guns in the name of stopping the attackers?

thu2111 · on Jan 23, 2020

Very few, but that's OK. Good spam fighters don't have to sacrifice many or really any toes to stop nearly all spam. You seem to be assuming a linear relationship between effort and false positives, but that would be a very ineffective spam fighting team relative to the ones I've worked on. In practice you can have nearly no false positives combined with nearly no false negatives.

This isn't easy and many firms fail at it, but you it can be done and we routinely did it.

tjoff · on Jan 22, 2020

IP block for registration?

Seems highly unrealistic.

scottmotte · on Jan 21, 2020

> automatically increase the prices of requests by a fraction of a cent to compensate

Great concept.

CPU, bandwidth, electricity, it's all just energy. And to a significant degree, money is just energy stored. I generate energy with my own work, store it in the form of money, and then transfer that energy to someone else, maybe to heat my home or cook me a meal.

Before money, I had to barter for those things. Maybe conceptually the internet is in a similar state at the moment. It doesn't have 'money'. Why can't I put CPUs in my wallet and then spend them? And why can't I charge visitors to my site by the CPUs they are costing me?

Instead, I have to, in a way, barter. For example, maybe I use ad revenue to earn my income, so I generate all this content, I barter that to the search engines, which barter with the advertisers, which barter with me, and I barter back to security guards to protect me from 'bad' actor bots. I'd really just like to receive CPU and bandwidth payments from them.

perl4ever · on Jan 22, 2020

Isn't the reason we are freed from barter in daily life is because the government is intimately involved in the financial/banking system, and regulates it and issues money and so on? Maybe we continue to struggle with the internet because it started out unregulated and has never really transcended that because people insist on thinking freedom is best for commerce without appreciating the nuances.

rashkov · on Jan 22, 2020

There are alternatives to that. For all of the hype and vaporware of the cryptocurrency movement, the idea of digital-native programmable internet money is a powerful one. I’m personally excited by the idea of involving currency at the protocol level and having it interact naturally over tcp/ip and http. There is an alternative to ads if we can make it work.

eeZah7Ux · on Jan 22, 2020

> Before money, I had to barter for those things

Not at all. Barter was quite uncommon also unpractical. Most societies used (and use) social connections and trust.

thu2111 · on Jan 22, 2020

I used to work on spam fighting.

This sort of solution is frequently proposed but doesn't work, because:

• Serving costs are rarely the problem. Normally it's annoying actions taken by spammers and the bad reaction of valuable users that matters, not the machine cost of serving them.

There are occasional exceptions. Web search engines ban bots because left unchecked they can consume vast CPU resources but never click ads. However, they only get so much bot traffic because of SEO scraping. Most sites don't have an equivalent problem.

• There is no payment system that can do what you want. All attempts at creating one have failed for various hard reasons.

• You would lose all your users. From a user's perspective I want to access free content. I don't want to make micropayments for it, I especially don't want surge pricing that appears unrelated to content. Sites that use more typical spam fighting techniques to fend off DDoS attacks or useless bot traffic can vend their content to human users for free, well enough that only Linux users doing weird stuff get excluded (hint: this is a tiny sliver of traffic, not even a percentage of traffic but more like an occasional nuisance).

• You would kill off search engine competition. Because you benefit from crawlers, you'd zero rate "good" web bots using some whitelist. Now to make a new search engine I have to pay vast sums in bot fees whilst my rich competitors pay nothing. This makes an already difficult task financially insurmountable.

The current approach of using lots of heuristics, JavaScript probes and various other undocumented/obscure tricks works well. Cases like this one are rare, caused by users doing weird stuff like committing protocol violations and such users can typically escalate and get attention from the right operators quickly. There are few reasons to create a vast new infrastructure.

wolco · on Jan 22, 2020

That's how ads work. More visitors more pageviews/clicks. People who serve ads don't want to pay for bots which is why they are a problem.

Doesn't medium do this?

danShumway · on Jan 22, 2020

> That's how ads work. More visitors more pageviews/clicks.

That's not asking people to pay for bandwidth/compute power, it's selling something adjacent to your content that you hope makes up for the loss.

> People who serve ads don't want to pay for bots which is why they are a problem.

That's kind of my point. When you ignore the arbitrage potential of serving requests for free, it forces you to care about making sure that your content is only available to the "right" users. You have to care about things like scraping/bots, because you're not directly covering your server costs, you're swallowing your server costs and just hoping that ads make up the difference.

Theoretically, in a world where server costs were directly transferred to the people accumulating those costs, you wouldn't need to care about bots. In fact, in that world, you shouldn't care whether or not I'm using an automated browser, since digital resources aren't limited by physical constraints.

In most cases, the only practical limit to how many people can visit a website is the hardware/cost associated with running it. A website isn't like an iPhone where we can run out of physical units to sell. So if they're paying for the resources they use, who cares if bots make a substantial portion of your traffic?

> Doesn't medium do this?

No, Medium just sells subscriptions, you don't pay for server usage. As far as I know, no one does this -- probably in part because of problems I haven't thought of, also probably in part because there are no good micro-payment systems online (and arguably no really good payment systems at all).

The closest real-world example is probably AWS, where customers pay directly for the resources they use. But those costs aren't then directly passed onto the user.

wolco · on Jan 22, 2020

If you had to pay for each request you would make fewer and limit the requests to serious requests (school, work, medical).

Having said that, you could provide a central service where people would buy credit to be used on many sites. So the micropay isn't the problem.

oefrha · on Jan 22, 2020

> you could provide a central service where people would buy credit to be used on many sites.

That central service is going to lock out many countries and regions as well as lots of people (minor, unbanked, poor, etc.) in non-locked out countries and regions. Payment is frigging hard especially on the international scale. This is every bit against freedom of information and strictly worse than Cloudflare.