Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Then you'd just give visitors of your websites no recourse and no information whatsoever on how to fix the problem. The benefit of client-side CAPTCHA is that humans at least can pass it and fix the problem even if something they don't control (such as their IP address having bad reputation due to shitty ISP) is causing problems.

As a website operator it's easy to look at the spam that is getting through and be happy that's it's zero. But do you get any idea how many actual humans that you have incorrectly rejected? You don't have that data and it's really easy to screw up there.

Of course if your website is small nobody cares. If you are bigger like Stripe you simply get bad publicity on HN. People on HN love to hate on mysterious bans and blocks just because they do something slightly unusual and your backend-only analysis flags them as suspicious.

Abuse fighting is hard.



>Then you'd just give visitors of your websites no recourse and no information whatsoever on how to fix the problem.

This is a weird assumption. What's preventing a backend system from saying "Hey, we think you're a bot. Here's an alternative way to contact us."

You obviously don't want to give away enough to help bot developers get through your system, but that's not the same as no resource and no information.

>But do you get any idea how many actual humans that you have incorrectly rejected?

Yes - like I said in my other comment, this new system actually logs all submissions. It just puts the ones it identifies as spam into a separate folder. Akismet also has the ability to mark things as false positives or false negatives.

I think that automated form submissions are very context specific. So, the example I wrote about is for a marketing site, and it's a business that primarily targets other businesses. Most of the spam it gets is for scummy SAAS software, SEO optimization, etc...

But my personal website has a very simple subscribe by email form. There were definitely a few spam submissions - someone just blasting out an email address and signing it up to whatever form would accept it. When I implemented double opt in - gone entirely.

My larger point was that as an industry, we seem to have just capitulated to client side CAPTCHAs. And it sucks. It's one of the many shitty things about the modern web. But I think it's become just an assumption that it's needed, and we haven't reexamined that assumption in a while.

I think it'd almost be better for there to be something could spin up in a container that has a base machine learning model, but can "learn" as you manually indicate messages etc... and then you can also choose a threshold based off your comfort level.


I think the idea here is that 1% of users with the shitty ISP is going to have a much worse experience than anyone was having with the captcha. This is super context-dependent, but being told I need to "contact an administrator" when I submit a form on a website is a good way to make me log out and look for alternatives to whatever service I'm using.

To me, the question is this: would you rather give 100% of your users a kinda shitty experience, or 99% of your users a normal experience and 1% a nightmarish awful shitty experience. The answer probably depends on use case.


> This is a weird assumption. What's preventing a backend system from saying "Hey, we think you're a bot. Here's an alternative way to contact us."

Not a weird assumption, but a necessary assumption based on considerations of scale.

A small-scale website that doesn't receive too much spam attempt can manually classify spam by human agents. A medium-scale website can have CAPTCHA to let through some visitors and the rest goes to human verification. You appear to be in this bucket. When the scale is huge, no other alternative way to contact exists. CAPTCHA becomes your only tool.

In other words, CAPTCHA is only necessary because of scale; what do you think the first A stands for? But because of scale, alternate ways stop working.


>When the scale is huge, no other alternative way to contact exists

1. This still doesn't preclude giving a blocked user recourse or information. Like how a streaming website will say "Hey, you're using a VPN. We don't allow that" - the user's recourse is to turn off the VPN, or find a new VPN that their service won't detect.

2. The case you're outlining is different from the scenario that most users are presented with a CAPTCHA. I encounter it when I am using a VPN and Googling something with Incognito mode. That means Google has already applied some heuristics and thinks that chances are higher than normal that I'm a bot (not logged in, no cookies allowed, masking IP address) before presenting the challenge. In those cases, you're probably correct that presenting a CAPTCHA is a reasonable option. I just think it's weird to have CAPTCHA be the default/first line in many cases. Especially with the focus on things like converting users.


> Like how a streaming website will say "Hey, you're using a VPN. We don't allow that" - the user's recourse is to turn off the VPN, or find a new VPN that their service won't detect.

No, the user's recourse is to stop using the streaming website and go back to piracy instead.

Any speedbump to UX is a lost customer. You can not and should not assume that users are going to jump through hoops, because the overwhelming majority will not.


I mean, the vast majority of people will not "go back" to piracy. Piracy isn't an option that's on the table for them. But you're missing the point.

>Any speedbump to UX is a lost customer. You can not and should not assume that users are going to jump through hoops

So... CAPTCHA isn't a hoop? Both scenarios are hoops.


> What's preventing a backend system from saying "Hey, we think you're a bot. Here's an alternative way to contact us."

In what way? If I got flagged and had to take additional steps to remedy a form submission I would probably just never go back to the site. The only way this could work is by identifying the issue in real-time and then sending a CAPTCHA to be completed by the user client-side while they're still handling the form.


> to remedy a form submission

The correct way to deal with an error in general is to return the form as filled by the user on the error page. Sadly so many SPAs just ignore this, that I usually manually copy before submitting any meaningfully sized text.

> I would probably just never go back to the site

Of course you would need to measure and reasonably reduce false positives. But in case you are serious about getting users to report them, an effective solution I've found with minimal friction is to fully use the mailto protocol scheme. Online shops can be static sites up to a scale, by adding product IDs and quantities to the mailto body, and having the customer order via e-mail.

> The only way this could work

Depending on your target audience, a CAPTCHA might not be possible.


>an effective solution I've found with minimal friction is to fully use the mailto protocol scheme.

This is another assumption that I am not so sure is valid anymore: that there are hordes of bots out there scraping every email they can find.

My personal website has had a raw mailto "Contact" button for several years now. Earlier this year I changed that email to an address I only use for the website (it's an alias) just as a way of tracking what comes through, and I have not received a single spam email to that address. Maybe I am tempting fate by putting that out there, and some asshole is going to try to ruin it for me. But it's my experience.

I'm more likely to get an email from a recruiter who has used a tool to scrape my email off of Github (though I've made that significantly harder plus nobody wants to hire programmers anymore the ultimate spam control!) than an email from them having clicked through to my website and using the Contact button.

I've gotten several real people sending me legitimate emails through it though. Sometimes they read something I post here, or they find a post from Google. Or it's someone I haven't talked to in a long time and they don't have a current email for me but they are able to reach me via my website.

Here's some cold emails I've been sent over the past 1-2 years: https://i.imgur.com/rMfOqb2.png

My website isn't big, but it is fully indexed by Google and other search engines and has backlinks. If the internet was really this dark forest where there are masses of bots out there ingesting every email they can find, surely I'd have gotten something by now.

I don't research this stuff, I can only share my anecdotal experiences. But it makes you wonder, right?


And just to be clear, it doesn't need to be either captchas or doing heuristic abuse detection on the backend. In the ideal case you're making a decision in the backend using all these heuristics and signals, but the outcome is not binary. Instead the outcome of the abuse detection logic is to choose from a range of options like blocking the request entirely, allowing it through, using a captcha, or if you're sophisticated enough doing other kinds of challenges.

But proof of work has basically no value as am abuse challenge even in this kind of setup, the economics just can't work.


Client side captchas are obfuscated code, so the bar of “the user can debug the problem and fix it themselves” is pretty high.

Also, reCaptcha definitely engages in both hell-banning and allows incorrect answers to pass the test. I assume the logic for those things is mostly server side.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: