Not sure when this article was prepared, but it doesn't mention WEI and other related technologies (remote attestation) which is IMHO the biggest current threat to interoperability. Sure, the "standard" is open, and theoretically anyone can implement it, but when the only "trusted" keys are those owned by Big Tech, it's hard to argue that's actually open.
Eh, I'm barely able to run a search engine due to all the bot activity. Less than 1% of my traffic is human, the other 2.5 million queries per day is from bots. It's only by the grace of Cloudflare it works.
Your mindset ensures the big corporations will be the only actors able to host anything with any level of interactivity. Doesn't sound very open to me.
Well, your mindset ensures the computing freedom we enjoy today will be destroyed. Google is on the verge of introducing remote attestation in browsers so that servers can cryptographically verify any number of things about your client. It's essentially guaranteed that none of those things being verified will be in our best interests.
I'd rather have a static web where I have the power to choose my own browser, inspect source, block ads or even just use curl or Python to scrape something.
What mindset is that, actually trying to build alternatives to FAANG?
I honestly think remote attestation is a smaller problem. At least then we can still build alternative services. I'd rather have a free web and a locked down browser than a free browser and nothing but an endless digital mall to browse. Remote attestation will only lock you out of that endless mall to begin with. It will find ways of sucking regardless of what Google does.
The mindset of sacrificing computing freedom for user convenience and corporation profitability. We should be absolutely uncompromising on this one value. The alternative is the destruction of everything the word "hacker" ever stood for.
> nothing but an endless digital mall to browse
Funny, that's essentially what the web feels like to me in 2023. Escaping it is the number one reason why I visit HN nearly every day. It's like the only sane website left. I don't even open the links posted here anymore, websites are just unbearable these days even with uBlock Origin turned up to the max. I just assume whatever's important enough will be directly quoted in the comments.
The "static web" you and the commenter below mentioned? I actually kind of want it.
Dude, my struggle is to be able to operate a non-profit service free of charge with no ads, and the aim is to help users find a way out of exactly the aforementioned digital mall and onto the free and independent, mostly static web.
This is harder every day dude to all the bot traffic helping themselves to disproportionate usage of my service via a botnet that is all but indistinguishable from human traffic.
But yeah, must be the corporate profits I'm hoarding...?
Why is it relevant whether the traffic is human or automated? The whole point of the internet is you can put a server out there and anyone anywhere can connect to it with any HTTP client.
To me it seems like the only people who care about that are those who want to sell our attention to the highest bidder via advertising. Wouldn't you be having the same difficulties if there were just as much traffic coming from humans?
I want to provide as many human beings as possible with value by distributing my processing power fairly between them. If I get DDoS:ed by a botnet, I won't provide anyone with anything other than optimistically an error page.
If I had infinite money and computing resources, this would be fine, but I'm just one guy with a not very power computer hosted on domestic broadband, and even though I give away compute freely, it just takes one bag of dicks with a botnet to use it all up for themselves, and without bot mitigation, I'm helpless to prevent it.
Oh and I actually do provide an API for free machine access, so it's not like they have to use headless browsers and go through the front door like this. But they still do.
Serves me right for trying to provide a useful service I guess?
Arguably, the problem here is that you want to do it free of charge. That's the problem in general: adtech aside, people want to discriminate between "humans" and "bots" in order to fairly distribute resources. What should be happening though, is that every user - human and bot alike - cover their resource usage on the margin.
Tangent: there's a reason the browser is/used to be called an user agent. The web was meant to be accessed by automation. When I use a script to browse the web for me with curl, that script/curl is as much my agent as the browser is.
I see how remote attestation and other bot detection/prevention techniques make it cheaper for you to run the service the way you do. But the flip side is, those techniques will get everyone stuck using shitty, anti-ergonomic browsers and apps, whose entire UX is designed to best monetize the average person at every opportunity. In this reality, it wouldn't be possible to even start a service like yours without joining some BigCo that can handle the contractual load of interacting with every other business entity...
(Also need I remind everyone, that while the first customers of remote attestation are the DRM-ed media vendors, the second customer is your bank, and all the other banks.)
The bot detection won't come without cost. It will centralize power in the hands of Cloudflare and other giants. I think it's only a matter of time until they start exercising their powers. Is this really an acceptable tradeoff?
If we do accept it, I think the day will come when Cloudflare starts rejecting non-Chrome browsers, to say nothing of non-browser user agents.
I don't see any good options at this point. The situation profoundly sucks for everyone involved. We're stuck between the almost absurdly adversarial open web, or bargaining with the devil at Cloudflare, and now Google's remote attestation which is basically Google taking a stab at the problem.
To be clear I don't think remote attestation is a good solution, but it's at least a solution. Any credible argument against Cloudflare or remote attestation needs to address state of the open web and have some sort of plan how to fix it. Or at least acknowledge that's what Google and CF are trying to solve. Dismissing the problem as a bunch of mindless corporate greed just doesn't fly. It affects anyone trying to host anything on the Internet, and is only getting worse. The status quo and where it's heading is completely untenable.
It's easy to say well just host static content, but that's ceding all of Internet discovery and navigation and discussion and interactivity to big tech, irreversibly pulling up the ladder on any sort of free and independent competition in these areas. That's, in my opinion, a far greater problem.
Yes, I agree with you. It sucks having to make these choices and compromises. The adversarial nature of the web is difficult for service providers but it's actually ideal for users. We all benefit from being able to connect to servers using any browser, any HTTP client. This is especially true when the service providers don't like it. Software like yt-dlp is an example of software that interoperates adversarially with websites, empowering us.
I apologize if I came off as aggressive during my argument. It was not my intention. I think we reached the same conclusion though.
Maybe the true problem is bandwidth is too expensive to begin with. Would the problem still exist if the costs were negligible?
Network bandwidth cost is negligible, it's hardware and processing power that's expensive. Each query I process is up to a 100 Mb disk read. I only have so much I/O bandwidth.
As far as I see it, there are two bad solutions to this problem.
The first bad solution is to have a central authority inspect most of the web's traffic and try to deduce who is human. This is the approach taken by Cloudflare, but essentially the same as Remote Attestation. It gives the chosen authority a private inspection hatch for most of the web's traffic, as well as unfettered authority to censor and deny service as they see fit.
The other bad option is a sort of 'free as in free enterprise' Ferengi Internet where each connection handshake involves haggling over the rate and then each request costs a fraction of a cent. This would remove the need to de-anonymize users, likely kill the ads business and virtually eliminate DDoS/sybil attacks. It would also be an enormous vector for money laundering, and as a cherry on top make running a search and discovery services much more expensive. I do think the crypto grifters pretty solidly killed the credibility of this option.
Xanadu comes close to this "ferengi Internet" mindset with some of the tactics it chooses for monetization of content, albeit from an entirely different angle (enabling remix culture more or less indiscriminately while preserving the sanctity of the existing copyright system and enabling royalties to flow to authors proportional to how their works are used and reused).
> The other bad option is a sort of 'free as in free enterprise' Ferengi Internet where each connection handshake involves haggling over the rate and then each request costs a fraction of a cent.
> This would remove the need to de-anonymize users, likely kill the ads business and virtually eliminate DDoS/sybil attacks.
Sounds like a massive win to me on all fronts. I agree with you.
> It would also be an enormous vector for money laundering
I don't mind. If that's the price, I pay it gladly.
Is this a purposeful DDoS or just bots trying to scrape results? If this is a DDoS on purpose, what's their financial gain? Did they demand payment?
If you're talking about bots scraping content, then the question is also why. Perhaps by letting them do so, you indirectly provide even more human beings?
It's entirely possible that these questions are absurd, however, since scraping using headless browsers is not free, then there must be some reason for scraping a given service... and it's usually something that in the end benefits more human beings.
Best guess is it's some attempt at blackhat SEO, to manipulate the query logs and typeahead suggestions (I don't have query logs but whatever, maybe they think I secretly forward queries to Google or something).
But really, fuck if I know. I've received no communication so I can only guess as what they're trying to do. I have a free public API they're more than welcome to use if they want to like actually use the search engine, but they still try to go through a botnet through the public web endpoint.
I've talked to a bunch of people operating other search engines and all of them are subject to this type of 24/7 DDOS. It's been going for nearly two years now.
>Why is it relevant whether the traffic is human or automated?
because all traffic costs for the service provider, but the automated traffic can be run at thousands of users cheaper than it is to run one human user (who after all is bounded by time and cost of computation and bandwidth) whereas the automated is not bound by time, giving them the opportunity to DOS you - either on purpose or just accidentally.
Rate limits do all of bupkis against a botnet. It's not possible to assume that each one IP or connection is one person. The crux that all of these initiatives like remote attestation are trying to solve is that as it is, one person may command tens of thousands of connections, and from a server-standpoint, there's really not much you can do to allocate resources fairly.
you're the first person to say anything about Captcha? The guy who started this argument needing some way to sort out human traffic operates a free service and is complaining the bot traffic makes it hard to offer a free service since bots cost money.
The problem is allocating resources fairly. A single human may operate tens of thousands of bots, and thus use disproportionate amount of resources, possibly all of them.
Since nobody is proposing alternative solutions that actually work, people are looking to Cloudflare and Google for help with the real problem that impacts them right now.
People are looking for solutions to:
- How do we only allow real humans to access it so we stop wasting money on handling spam requests?
- How do we permanently ban a known malicious individual from accessing it?
Sorry to disapoint you but web integrity doesn't work either. On Android where you have the play integrity api, bot farms are still very well alive and kicking.
Third party rom users are affected on the other hand though and third party browser vendors will similarly be if this is pushed forward, reducing competition in the space.
The whole thing is a complete failure on Android so I don't see any argument why we should also suffer on the web.
Spammers and fraudster are a problem, sure, but the biggest? By what measure?
If we're trying to predict which threats are "big" enough to lead to system failure, the analysis is quite different. In natural systems, parasites tend to fill niches and can persist for lengthy periods, often as long as the host. Or longer.
Think of it as a historically situated evolutionary battle. Thinking over many scenarios, there are many failure modes. One way to tease apart the likely causal threats involves thinking through a lot of scenarios.
Under what conditions you think spam/fraud would (more or less) 'destroy' the open web? And what does that destruction look like to you?
Time and money lost to them. The idea of creating a web closed from those people starts to look very attractive when you have to spend time multiple times a day cleaning out spam from your site.
>Under what conditions you think spam/fraud would (more or less) 'destroy' the open web?
It's already happening.
>And what does that destruction look like to you?
The destruction looks like more websites able to offer free or cheap services. A great reduction in spam comments. More effective ads. For good actors nothing really changes.
The biggest threat against the open web is Google who wants to turn the web into yet another AOL with vertical integration.
It is the corporation who wants to put malware on your computer under pretense of "verifying content integrity", just to force people to see their ads from spammers and fraudsters.