This plan looks even worse for privacy, and seems to rely on authenticating the unique device itself.
> Know your user is coming from an authentic device and signed application, verified by the device vendor directly.
If they manage to get wide adoption of something like this it seems like a very bad day for privacy, and a joyous day for advertisers, anyone who wants to prevent web scraping, Google and/or anyone who might want to make it hard to crawl the entire web…
I read further, and it sounds like they've designed it with privacy in mind. That's nice.
I'm still having a hard time seeing how this doesn't eventually lead to a completely locked-down internet, where users can only use approved browsers and devices.
They have a list of steps for how a request would be made, where steps 2 and 3 are:
> 2. Safari supports PATs, so it will make an API call to Apple’s Attester, asking them to attest.
> 3. The Apple attester will check various device components, confirm they are valid, and then make an API call to the Cloudflare Issuer (since Cloudflare acting as an Origin chooses to use the Cloudflare Issuer).
In a theoretical future world where 99% of site operators have set this up for protection, and 99% of users are using approved browsers, how would one do something like...
Create a competitor to Google? You'd need to crawl the web for that. Would you imagine Apple or Cloudflare would gladly let your device request millions of tokens per hour? Or would that be throttled or disallowed entirely?
Use curl (or telnet, or [any other HTTP client]) to grab a page?
Use yt-dlp to download a YouTube video?
Scrape a bunch of data for an AI project? See this article from the front page where someone scraped a bunch of car listings from KBB and trained a model to estimate car prices and found some interesting results. https://blog.aqnichol.com/2022/12/31/large-scale-vehicle-cla... - would something like that be permitted under a system like this? Or might you need to own/rent an army of authorized devices with authorized browsers to do that experiment?
The Apple attester will check various device components, confirm they are valid
Here's the "you will be under our control" part of their scheme. Running any "unauthorised" software? Rooted/jailbroken? Certain "security" features disabled? Using third-party replacement parts? ... Social credit score too low? Too bad, you're now denied access.
I've implemented PAT on a service. A server would only want to require PAT when it wants to see if it is dealing with a human. In the context of, say a blog, you would require PAT when someone wants to make an anonymous comment or create an account.
For requests which are "reading" you just serve content, unless for some reason you only want human eyeballs to see your content.
In your proposed scenario, reading the publicly accessible contents of the web, there should be no problems. (Of course some percentage of sites will accidentally have required PAT at any time and be unscannable, but presumably they figure that out and fix it.)
Now for the good side: I, reluctantly, implemented a geolocation filter to control anonymous content additions to that service I was alluding to. I felt bad about it, but I also felt bad having to filter out content spam every day. It turned out that all my strange content spam came from one country, so I banned 143 million people from anonymous content creation for my convenience.
With PAT I can remove the national ban and let any "probably human" in.
> For requests which are "reading" you just serve content, unless for some reason you only want human eyeballs to see your content.
I was going to cite the LinkedIn case where, last I had heard, the courts had decided that scraping was legal...
Headline [0] from April:
> Court rules that data scraping is legal in LinkedIn appeal
> LinkedIn has lost its latest attempt to block companies from scraping information from its public pages, including member pages.
... but upon googling it, I found a more recent [1] ruling :/
> LinkedIn prevails in 6-year lawsuit against data scraper
> The U.S. District Court for the Northern District of California sided with LinkedIn in its six year lawsuit against a firm that scraped data ...
So that sure puts a nail into the argument I was going to make. But still, while I think your use case lines up with the spirit of this kind of system, I think the reality is that it also would be used by every single site with a signup wall to kill off the archive.ph's of the world.
> In a theoretical future world where 99% of site operators have set this up for protection, and 99% of users are using approved browsers, how would one do something like [compete with apple or google]
A: they won't. and that's the plan. not to mention that now those are the only two players (microsoft a late third) that can both attest you and profile you locally on the device for advertising profiling.
>Talking about the "privacy" of what has been made publicly available makes no sense.
Yes, it does. Users often wish to be able to delete or make something that was once public private. For example someone could post a picture of themselves on twitter. A year later they are no longer comfortable with having pictures of themself online so they go and delete them. Despite the user deleting them malicious scrapers will not delete them and keep those images. Another example would be setting your real name to your twitter name. Later you aren't comfortable using your real name so you change it away. Scrapers may still have your real name despite you wanting it to be a secret.
Users often wish to be able to delete or make something that was once public private.
People also wish to be be able to do a lot of other things, but that doesn't make it right.
What becomes public history must remain immutable. Otherwise you're just going to encourage a state in which those who have the power to will destroy and rewrite the past to their advantage, to control the narrative over the population. The trendy phrase "right to forget" is effectively a "right to rewrite history".
It's interesting that you automatically call those wanting to preserve what could possibly be very important history "malicious scrapers".
>It's interesting that you automatically call those wanting to preserve what could possibly be very important history "malicious scrapers".
I am going off of twitter's view. If you store tweets locally you must listen for when they get deleted and then delete them on your end too. If a scraper is breaking twitter's rules I consider that malicious scraper.
I assume the OP means that scraping is used to collect public data, including that of individuals, which can even be linked across different websites. There’s at least a couple of services that try to connect somebody’s Instagram account to their FB/Twitter/LinkedIn etc. I assume some of those rely on scraping (+username checking), since the TOS for the APIs of those social networks probably prohibit this use case.
Yes, big datasets of user data can be created and sold. This user data can be joined across multiple sites to build up profiles on people. These datasets floating around can harm the reputation of a site.
If you think that's bad, you're not going to like the fact that they're letting the NSA design a system for "TPM-based Network Device Remote Integrity Verification":