Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Web fingerprinting is worse than I thought (bitestring.com)
620 points by Bright_Machine on March 21, 2023 | hide | past | favorite | 510 comments


Fingerprinting is doing terrible things for big-tech data collection, and at the same time it's excruciatingly hard to protect against bots, spammers, fraudaters etc without it.

Few people seem to try to reconcile this, since neither side cares about the other.

I personally think that discussion about fingerprinting as raw tech, without mentioning the size of the company collecting the date or the purpose is meaningless, and only leads to a few tech savy users having less data collected on them.

Most people want to use Javascript, use the default setting and not be afraid of clicking on links. I can't really see a good solution without a coordination of regulation and tech standards, so I'm hopeful at least for decent solutions.


You don't need to precisely identify users across sessions without their consent to detect bots, advanced anti-bots make heavy use of biometrics to detect bots and don't rely too heavily on fingerprinting, mostly because they're easy to spoof in general, but generating human-like mouse data is a bigger challange.


Sure, but on the other hand, a lot of anti-fingerprinting efforts strive to reduce the info available including things like mouse movement data.

Mouse movement data is a fairly potent fingerprinting vector. Bucketing the average spouse speed and acceleration rates could provide provide useful information. This may imply specific OS speed settings, or physical mouse DPI. A machine learning system would likely be able to distinguish traditional mouse, vs trackpoint, vs touchpad, vs trackball. Etc.

Also it is not just bots that have non-human like mouse movement. Many assistive technologies would have no mouse movement, or would auto snap the mouse to relevant spot. That is actually a quite powerful for fingerprinting, since assistive technology users are a pretty small subset of internet users, so only a relatively small amount of additional data is needed to uniquely fingerprint that user/machine.


I wonder if that would be enough to precisely identify a single user between millions like regular fingerprinting can already do, but yeah it's still a big fingerprinting vector



Bezier curves are easily detected by machine learning models as non-human, that software wont work on akamai or any decent anti-bot


I wonder if you could use a chicken like in the old chicken tic-tac-toe machines to mimic real user behavior.


Disabling JavaScript does not stop fingerprinting either. HTTP headers are sufficient to construct unique user identifiers. Passing that data via API to a FaaS provider would enable cross site tracking that's invisible to the visitor.

Edit: The required FaaS implementation is trivial too. I could launch an endpoint that performs exactly this function in 30-60 minutes.


In fact, the disablement of JavaScript itself is a very identifying characteristic.


Its one added bit versus countless bits that can no longer be probed. Yeah disabling JS alone is not enough but it is not useless either.


Not all bits have an equivalent distribution. If very few people have that bit set then it is very differentiating.


> I can't really see a good solution without a coordination of regulation...

Totally agree that this is perfectly within the government's purview, and they should be doing something about it. But, as with anything else in the US, until a Fortune 100, some few 1%-ers, or the deep state MIC wants it, we're not going to be getting it.


Until everyday people realize they’re being stalked, I don’t know what will change. I am seriously thinking about trying to go through the proposition process in my state to forbid selling of data (this should already run afoul of wiretapping laws, imho).

I thought having an ad campaign that targeted subgroups very specifically and boldly might be enough drum up public interest. Something like: “Hello $name from $city. How did $recent_embarrasing_purchase work out? I hope you enjoy your birthday in $birth_month.” And then a link to the proposed policy.

Unfortunately, marketers have neither scruples nor the ability to control themselves and have captured an asymmetric advantage. Technologists do what they do, preoccupied with whether or not they could, not stopping to think if they should. It seems like legislation may be the only remaining option.


Pretty much what Signal did a few years ago [1], but on a bigger scale. Sadly Facebook banned their Ads account and couldn't do it further, would be interesting if someone tries the same.

[1] https://signal.org/blog/the-instagram-ads-you-will-never-see...


People realise they're being stalked, they don't know what that means though.

Techie people are convinced non-techie people don't know they're being tracked. They do! Ask your smart non-techie friends what they think about online privacy. I guarantee you they'll say something like "yeah, I know it's probably tracking me, but whachya gonna do".

Thanks to this disconnect, we have so many privacy campaigns with a message like "Did you know you can be uniquely identified on the web?", but so few (none?) that actually proceed to explain why that's bad, and what someone could do with that information. That's the missing piece. Give average people an actual reason to dislike or fear tracking, not just the mere curio that it exists.


I will admit that it always made me confused as to why browser has access to detailed hardware information. I can understand OS. I can understand resolution. I can rationalize GPU. I don't understand though why it should be able to access .. well, everything about the machine.

edit: It is still impressive. Even with the firefox settings on, the website was able to identify me. I am not entirely certain how I want to approach this.


> I can understand OS. I can understand resolution. I can rationalize GPU.

None of these should be available to websites by default. The first two come from simpler times when people were not as concerned with privacy implications. The third has been and continues to be pushed by advertising companies (Google, Apple, Microsoft).


edit/ update from original post cuz i cant edit anymore

So quick update since I am mildly obsessive.

I was sure it was either GPU, CPU or addons that were giving me away ( I do have a mildly unique setup ).

I ran few tests in VM and the moment I dropped GPU passthrough ( left CPU passthrough ), I was no longer ( based on that website anyway ) tracked across sessions.

In other words, cat and mouse game continues.


Because the browser has become a vendor neutral, architecture neutral app engine and people want to do things like play MIDI instruments, use serial ports, use proprietary USB check scanners for accounting/ERP apps that work on the web and don't need SCCM to manage, etc.


yeah, but it should ask you if you want to allow the website to know this kind of stuff instead of just allowing it by default


> people want to

Some people want to do those things and for very specific websites. Most people don't even know what MIDI or serial ports are.


I would assume for more advanced browser features, like 4k video playing, that hardware information could tell the player whether your machine is capable of playing back 4k video without stuttering.


I got me on iPhone through VPN change, clear cache, private window, and reboot.

I know what to think about this… I fucking hate it.


So i just found that the "SnowHaze" browser prevents fingerprinting on fingerprint.com and https://browserleaks.com/canvas

https://apps.apple.com/nl/app/snowhaze/id1121026941?l=en

https://github.com/snowhaze/SnowHaze-iOS


Note that there's also:

  Settings > Safari > Advanced > Experimental Features
where you can disable OpenGL and such (i haven't tested yet.)


That I can relate to, but the more immediate question is whether you are willing to adjust your habits to nullify its impact. Most people would not.


> Until everyday people realize they’re being stalked, I don’t know what will change.

FTFY: People already know; nothing will change.

Many of the things that are happening (at least in the US) are deeply, deeply unpopular, but are not changing, and show no signs that they are even susceptible to change. Fortune-sized companies, the 1%, and the deep state are calling the shots, despite how much can be seen in real time, through things like Twitter and TikTok. I've actually had to pull back from Twitter because of all the things that are obviously beyond the pale, yet will never change. (Snowden, Assange, et. al.)


That’s why I, unfortunately, think legislation is necessary. My state allows citizen proposals with 250k signatures to get on ballot and >50% support to become law that cannot be overturned by the legislature (that has its own issues, but in this case it would be binding).


>> I thought having an ad campaign that targeted subgroups very specifically a

This has been tried by a guy who placed Facebook ads like these. FB blocked his account in a few hours.

So good in theory, wont work in practice


I would consider donating to such an effort. I'm sure there are others like me.


Yeah man, I think that's the only way anything is going to change

People are such dumb fucking cattle that they'll lash out at you rather than the data brokers or the software vendors who ratted them out though


> they'll lash out at you

Not only that, but they might have a legal case against you. I've been slowly working through Seek and Hide: The Tangled History of the Right to Privacy, and my main takeaways have been:

(1) The constitutional right to free speech and a free press is not as broad as most people probably think.

(2) Truth is not necessarily an air-tight defense in a case of libel, as courts at various times and places have decided against publishers for true but embarrassing things intended to humiliate or harm.


Maybe the trick would be to put it into a security envelope so you don't disclose anything.. although I personally love the idea of printing it on a postcard, since it's practically public record once a data broker gets his filthy paws on it anyway


Ad platforms aren't that stupid.


Do it.


Ha! I followed the instructions and went to fingerprint.com and it all 'crashed' because I had JavaScript turned off—that's my normal default setting.

I have five different browsers on my smartphone and three on the PC all sans JS and none of them are Chrome. Also, normal operation is to automatically delete all cookies at session's end.

My smartphone and PCs are de-googleized and firewalled and I never see ads in my browsers nor in apps. The apps are mainly from F-Droid and sans ads and the few Playstore ones I use are via Aurora Store and are firewalled from the internet when in use. Honestly, I cannot remember when I last saw an app display an ad, it has to be years back.

In the past I used to go to more extensive measures to stop the spying but I found it was unnecessary as the spy leakage was essentially negligible with much less stringent efforts.

It's pretty easy to render one's online personal data essentially wothlesss if one wants to. On the other hand if you insist on using JS, Gmail, Google search, Facebook etc. then you're fair game and you only have yourself to blame if your personal data is stolen.


Before you get all jubilant, note that they have fingerprinting techniques which don't use JS[0]. It was able to identity me. Contrary to popular opinion, disabling JS doesn't protect you from fingerprinting.

They describe their approach[1]. They use HTTP headers and conditional request triggered by CSS conditional media queries to gather data. Something like @media(...) {background: url(/tracking/$clientid)}. But in principle, they could also try and fingerprint the TCP/IP stack or the TLS implementation. I'm not sure it would get them more data than OS+Browser, though.

[0] https://noscriptfingerprint.com/

[1] https://fingerprint.com/blog/disabling-javascript-wont-stop-...


"Before you get all jubilant, note that they have fingerprinting techniques which don't use JS[0]. It was able to identity me."

I didn't detail every protection I've put in place or the post would have been too long. However, I'd suggest that spreading my browsing over at least eight browsers (and I actually use more than two machines and do so at different locations and with different ISPs) effectively reduces my profile across the net.

I also use randomized browser user agents and clean links, occasionally I'll even cut-and-paste links between multiple browsers in a single session. I often do this on HN not to hide from HN but for convenience when multitasking. (Having worked in surveillance professionally, this modus operandi just comes naturally, it's now second nature for me to work this way.)

Working with multiple browsers and multiple machines also solves the problem when on rare occasions I have to use JS. That said, I never watch YouTube with a JS-enabled browser, instead I'll use NewPipe or similar. There are other measures I could list but you get the idea. Oh, and I never use the internet on a smartphone with a SIM enabled, instead the SIM resides in a separate portable router and my 'real' phone is a dumb feature phone, it's only capable of making phone calls.

I really don't care if some stuff leaks but I've satisfied myself it's pretty trivial, as frankly, I've not had one indication over the past 20 or so years that I've been targeted as a result of fingerprinting. It's not necessary to make things completely watertight, I'm not trying to hide from the NSA or GCHQ, etc. (and it'd be unsuccessful and a complete waste of time to bother trying).

Moreover, even if something were to leak, I'm simply not a revenue-making target—that means I never respond to any targeted marketing because I simply never receive any.


FF mobile gives be different IDs each time I run a new private session on both the JS an non-js demos (I run w/o JS usually AND have enabled the resistFingerprint setting)


It’s one thing to generate the same hash for the same client, but it would be interesting to know how often the hashes collide, too.

I also notice that the no-JS hash changes when I move the window to a different monitor.


To me this seems extremely elitist. Non-technical people deserve to have their personal data stolen because they don't know about javascript for example?


Technical defenses are never perfect. In a sense they provide security through obscurity, as evinced by the comments above regarding Stallman's use of wget. If everyone applied technical defenses equally then workarounds would quickly be found, and everyone would be equally vulnerable. So privacy is a scale, and being in the minority provides its own defense. If in aggregate each individual is equally valuable, then the value of breaching a minority's technical defenses is some inverse multiplier of the minority's size. Personally my threat model is to put in just enough work to never be the juiciest target.


I run a similar setup as the OP when browsing the modern web, but i think it is in a way our responsibility as professionals to help the less tech inclined to navigate the sea of monsters the modern web has become.

For example: I have set up the systems of family members for whom i am some sort of digital janitor with a nice collection of firefox plugins to get rid of the worst offenders.


I get where you're coming from,but...

If you continue to willingly use socials like FB, TikTok, et al, your complaints about stolen personal data fall on deaf ears. Show me that you don't have those apps installed or do not visit their websites, then we can talk about being serious on deserving to not have data stolen.


"To me this seems extremely elitist."

Right, it probably is. But the issue of stolen personal data has been around for so long that nontechnical people have had years to develop political lobbying and to swing elections to put a stop to it.

The fact is that most people don't give a damn about such matters, if most did then the problems would be behind us by now.

Thus, unfortunately, with the internet it's every man and woman for him or herself. QED!


Have you ever tried to talk to "non-technical" people about this subject? They treat you like you're one of those tinfoil hat crazies.

At this point I'm 100% OK with us being the only ones able to protect ourselves. We warned them and they didn't care. Allow them to remain uncaring. We don't have to help everyone. People must want to be helped.


"Allow them to remain uncaring. We don't have to help everyone. People must want to be helped."

When people don't understand the implications or full ramifications then governments and lawmakers have to step in as they have a duty of care to protect citizens. It's one of the principal reasons for having government.

There are any number of examples, regulating the use of poisons, putting protection fences around cliff top lookouts, specifying the breaking strain of elevator cables, aircraft compliance design, removing lead from petroleum, and so on.

Unfortunately, governments have failed to act despite many warnings about these privacy matters.

Incidentally, there's an uncanny parallel between this example of governments failing to act even when in presence of the facts and my last example. In 1923 when Thomas Midgley and cohorts—engine makers and petroleum companies—sought permission to put tetraethyllead in fuel governments already knew the dangers of lead poisoning. Not only did they ignore all scientific warnings about the dangers of using the additive but also they embraced Big Business and approved the move at the citizenry's great expense.


What they want is things to be easy and require a low to non existent cognitive load. You start confusing them with details of what could happen etc and all the gyrations they have to do to avoid it, they tune out and look at you like a tinfoil hat crazy (are you sure they aren’t right?)

As the techno elite, it’s actually our job to create the underlying reality everyone else participates in when using technology. So, it is our responsibility to care, if you care. It’s not theirs - they’re just here for the party. But that doesn’t mean they’re sheep for slaughter, because there are plenty of folks ready to slaughter them for money.

It’s our ability to understand the issues and to actually improve them that uniquely makes it important for us to care. But we can’t expect people to turn off the cat video for long enough to listen to us nerd at them, and we really can’t expect them to do something complex to avoid something they don’t understand or care about. What our challenge is is - how do we improve internet technologies sufficiently that everyone enjoys what we know is important but we don’t require them to care? That’s how you build a better emergent reality.

I’m glad to have had a hand in the Netscape and Mozilla’s launch and have watched Firefox for years with pride. They are the closest to a mainstream any man product that even remotely cares. WebKit safari is a close second. I hope we all find ways to develop the tech platforms that protect as well.


> are you sure they aren’t right?

Yes, I'm absolutely sure. Do I really need to justify myself here on HN of all places? On a thread about the fingerprinting implements of the surveillance capitalism industry?

> that doesn’t mean they’re sheep for slaughter

Welp. If they don't want to be slaughtered like sheep, they better start caring then. I'm done with that.

At this point what I really care about is strengthening my own privacy by having more users in the anonymity set. The more indistinguishable users there are, the more effectively we are protected. I figure that if they're apathetic enough to allow corporations to exploit them with absolute impunity, they're also apathetic enough to join the anonymity set. Browsers just need to make that choice for them. It needs to be the new default.

> we can’t expect people to turn off the cat video for long enough to listen to us

I can and I do. What we're saying about this matter is important. People should listen, join the discussion even. When we reach out to people about matters we consider important, we do it with the best of intentions. We expect they'll at least put some thought into it. If not that, we expect they'll at least treat us with some respect, not like some schizophrenic off his meds. Can't expect anyone to continue caring after multiple instances of that.

> What our challenge is is - how do we improve internet technologies sufficiently that everyone enjoys what we know is important but we don’t require them to care?

Someone's gonna need to have the balls to make the choice for them. I don't have the resources to just make a better browser though. I do what I can by installing uBlock Origin on every single browser I come across. Everyone loves it and tells me that the web "feels" much better, though they can't quite explain why.


> Non-technical people deserve to have their personal data stolen

Nobody said that. "My defenses work" != "my defenses should be necessary".


"On the other hand if you insist on using JS, Gmail, Google search, Facebook etc. then you're fair game and you only have yourself to blame if your personal data is stolen."


... okay, I reread it for a third time and you're right. Not sure how I managed to miss it the first two times I read the comment. Yeah, that's nonsense.


Uh… they definitely said that. They specifically said that people were “fair game”.


Yeah? If they don't know how to operate a computer then they shouldn't be operating one. The same I would feel if someone without a licence crashed their car.


Using the web while being unfamiliar with Javascript is not analogous to driving without a license. It's closer to driving without being a mechanic.


But when my mechanic tells me that the grinding noise while braking means I need to have the brakes fixed doesn't excuse me from continuing to drive without fixing the brakes and it doesn't really magically get fixed by turning up the radio until the noise goes away. To further your comparison, devs would be the mechanics, and devs have been screaming that operating browsers without blockers is similar to not getting the squeaking noises fixed. Everyone just keeps turning up the volume until the underlying noise goes away.


The more you customize the more unique your session becomes.


Not if you disable JS, cause the website then can't see any of these customizations.


Except that disabling JavaScript is an anomaly all on its own. The dozens of users running without JavaScript might not be individually fingerprint able but it's still a small enough cohort that I don't know how much I'd lean on that. Figure in the user agent string and it's probably unique enough a subgroup to sell ads to.


> it's probably unique enough a subgroup to sell ads to

I have been browsing with JavaScript disabled by default for the past 6 months. Based on my experience, no-JavaScript ads are rarer than four-leaf clover.


More common than you think

https://amiunique.org/fpnojs


Probably not a representative sample though.


Also that cuts down the group so much, i imagine other things that are usually too coarse grained to be useful suddenly become much more useful. E.g. geoip location or accept-language headers.


> Figure in the user agent string and it's probably unique enough a subgroup to sell ads to.

But if you never see ads how do you sell ads to them and how do you meaningfully discover enough about the person to feed them valuable ads?


But ads don't work without javascript.


Having JS off probably puts you in the < .1% of users bucket. Unless you additionally are: * Routinely moving between IPs * Modifying your headers to avoid giving away info (user agent, etc) * Defeating all the other non-JS things that fingerprinters probably look for

then you are not safe by just turning off JS.


Being in a 0.1% bucket is only ~10 bits of information - much less than what can be gathered with JS on.

And of course its not enough, but the situation is even more hopeless with JS on.


Not if you disable JS, cause the website then can't see any of these customizations.

That's adorable. I guess you're not old enough to remember when we used to track people with things like invisible pixels. Or todays equivalent: testing CSS parameters.

Neither require JavaScript, and there are a hundred other non-JavaScript methods.


With that method, you won't be able to distinguish between the many different devices using the same browser at same resolution behind one IP.

In the era of CGNAT that means you now only know which city I'm from and whether I use Chrome or Firefox. People mostly use browsers in maximized and resolutions are relatively standardized nowadays.

Compared to the data you get from canvas and webgl, that's much less unique.


Hah! I used to embed invisible pixels for our marketing department decades ago.


This should automatically qualify one to lose their internet privileges. Not just the fact that you did it, but your cavalier attitude towards it with the lack of regret for having done it


I actually agree with you. However, I plead that it was novel at the time, used ineffectively by a tiny marketing department, and not anywhere near this spy level capacity achieved today.


It’s fun to put Easter eggs for people like you. https://once.getswytch.com


I have no idea what you’re talking about. That URL only tries to load one piece of JavaScript, htmx, and all it does is unbreak the mobile navigation.

(Aside: this mobile navigation is, incidentally, the worst implementation I have ever encountered: instead of twiddling some classes or such, which would happen instantly, it makes an HTTP request that responds with the new navbar. For me, this means at least half a second’s latency on clicking the button, more if time has passed so that the HTTP connection is no longer open (1.5–2 seconds). It also fails the no-JS test, as the unintercepted form-submit just serves the page with the closed mobile navbar again, not switching out the navbar as I expected it might, and which would have been enough to avoid an unconditional “worst implementation” award. Sorry if you made this and it hurts your feelings, but… ugh, this is just a baffling misapplication of hx-post and naive Tailwind use, and just unconditionally a bad approach.)

Edit: better link which shows what I suppose you probably meant: https://once.getswytch.com/app


Haha. Yeah, it is pretty terrible and I made it.

It’s mostly a tech demo, so the things it does are intentionally weird/strange.


You are easily tracked without JS. It is much easier than tracking a default settings browser.


> It is much easier than tracking a default settings browser.

Not true. Especially if you mean a default browser with Canvas/WebRTC APIs enabled.

It is much more difficult for fingerprinting companies to get a high entropy fingerprint from a no-JS user.


Good luck with Lynx/Dillo with a fake UA.


It's important to know that the mentioned "resistFingerprinting" breaks a lot of the web.

Examples include the back button, uploading photos on some websites uploads random data instead of the photo, etc.


If it breaks uploading a photo, it’s because the page unnecessarily copies the image into a <canvas> and then tries to upload the data from the <canvas> instead of the original image.


> the page unnecessarily copies the image into a <canvas> and then tries to upload the data from the <canvas> instead of the original image.

Surely there could be valid reasons for doing so?

I imagine for example that:

1. It ensures the selected file is a valid image before uploading it

2. It strips meta data like GPS position from the image before uploading it

3. It could reduce the size of the image, by either scaling it down, or compressing it more, or both, before uploading it


These are valid use-cases I agree. However I don't see why <canvas> should be leaky to support those use-cases.

Browsers should ensure all <canvas> operations produce identical results across platforms and hardware, and anything in the spec that prevents this should be removed from the spec.

Now, I recognize some of that functionality is handy for certain apps. In that case do like Android and put it behind an opt-in API, so the user can deny.

Basically I think browsers need a "web app" mode and a "surf mode". Just using visiting my local news outlet shouldn't require all the fingerprinting stuff.


The real snag comes from putting text into a canvas. Nobody can agree on what fonts they have installed, and of course there are all kinds of subtle variations from one version of the “same” font to the next, and then everyone has different ideas about hinting, kerning, stem widths, etc, etc, etc. You can fingerprint basically everyone just from that information alone.


Every browser can agree on a specific font if they truly cared about end-user privacy.


Sure fonts and text is hard. But none of that is needed for basic surfing of the web.


In that case you should use Firefox, and turn on “resistFingerprinting”. It’s not perfect, but it’s approaching real privacy.


Or the Arkenfox config (https://github.com/arkenfox/user.js), which enables resistFingerprinting among other essentials. In this kind of game, a community config is exactly what you want.


Either there is a complete and total consensus on every aspect of rendering or there are differences in how <canvas> is rendered.


Ciphers and hashes publish test data so you can ensure conformance. Don't see why, in principle, one couldn't do something similar with a stripped down <canvas>.


> I think browsers need a "web app" mode and a "surf mode"

Agree. It will be hard to define a standard for "surf mode", but in addition to privacy benefits there would be security benefits for the browser container as well.


I don't think it would be that hard, start with "no javascript". Add a better compataiblity method. Ideally add ways to get the browser to do common stuff like resize images, although even saving that for "app mode" would be a big improvement on the current situation. Making the standard is easy, it is getting anyone to follow it that is difficult. Sites could already work great without javascript if they wanted to but very few do.


"No javascript" is a non starter in my opinion. That's a very simple on/off switch that is already available but has very little buy in. As you noted, "JS off" mode requires a shift in what HTML/CSS are capable of on their own.

> Making the standard is easy, it is getting anyone to follow it that is difficult

That's my point, those two parts aren't disconnected. The standard isn't useful (or a standard really) until people follow it, and in this case that's most of the internet connected world. Both people building for the a new default subset, and users accepting a default subset with opt in "web app" bells and whistles.

Without removing JS, in my head it's along the lines of starting with a freeze of a current ECMA version, define the API's that are stripped out, force low fidelity timers, remove JIT, limit some cross origin options. Stop adding shiny new feature's every 8 weeks. Keep it there for 3-4 years. Or maybe a similar concept with a WASM container when it gains some browser usefulness. Then there's the html and css subset too. So, defining that stock subset navigator at the right level is what I see as the "hard" part.


There are some improvements that could be made to HTML/CSS but it is already possible to do a bunch of fancy stuff with no javascript. I don't think it is possible to avoid tracking while allowing javascript, unless only the most trivial javascript, and for that there is likely to already be HTML/CSS alternatives. The stuff you are talking about is already available if you dig into the settings, although of course picking and choosing your own collection of settings like I do is itself a unique identifier. But there need to be a bunch more restrictions to actually prevent fingerprinting.

I think the lack of buy in is because the people who would need to buy in are the ones pushing the tracking. Rather than a new standard something like a directory of sites that work well without javascript (and search engine just searching those sites) with enough people using it for it to be an advantage to be listed seems to me to be more likely to be effective.


> Browsers should ensure all <canvas> operations produce identical results across platforms and hardware, and anything in the spec that prevents this should be removed from the spec.

You would basically have to kill all hardware accelerated features and run everything in an interpreter. Also make sure that turbo button is set to slow, to get consistent behavior across all CPUs.

The only real way to prevent finger printing is to lock these features away by default and force websites to beg for every single one of them, not a "accept all" screen, make the process so painful that 90% of users would rather avoid those abusive sites entirely, basically the same dark pattern shit every site pulled with the cookie and GDPR accept popups, just in reverse.


3 sounds incredibly undesirable to me, assuming we’re dealing with a jpeg. Go through 3 or 4 rounds of that and compression starts to get pretty visible.


Most websites will recompress user images. Although you probably don't want to do it client side.

The biggest reason is if course cost saving. Store and transfer smaller images. This could be done client side with a server side check on max size.

Another big reason is metadata stripping. Both to protect the user (can be done client side) and to avoid unintentional data channels being provided.

Another reason is to avoid triggering exploits. If a major browser has a JPEG rendering exploit Facebook doesn't want you to be able to pwn everyone who sees your post. By using a trusted encoded it is very likely that the produced image is more or less following the standards and not likely to trigger any exploits (as exploits usually require invalid files).


I've had to implement this - we have a web app used by engineers in the field where signal is often not great. We got lots of complaints about image uploads as for a typical job there would be potentially 100+ images that needed to be uploaded (multiple assets with 2 before and 2 after photos per asset).

iPhone defaults to uploading a large image which can take ages to upload. We implemented a canvas based solution which sends a base64 string representing a compressed image and reduced the upload file size by about 90%. We don't need high quality original images in the backend.

I may have missed a trick, this has been in place for a few years now but at the time I couldn't find a better solution.


I was under the impression that base64 encoding doesn't reduce file size of an image at all, rather it sometimes increases it. That wasn't the point of using base64 string, right?


> a base64 string representing a compressed image

Parent explained that the base64 encoding held compressed data.


But why base64 and not just… send the bytes? 6 bits per character vs 8


There are many perfectly valid reasons to do that. It’s a lot more scalable to resize images client side rather than server side and using a canvas is one of the simplest ways to achieve that.


It's necessary if you have filters in the upload flow, like Instagram does (which is why it breaks.)

Or it might not be strictly necessary, but Instagram does it anyway.


> unnecessarily

No, this is how most pre-upload image editors work. Why upload a 5MB avatar photo that's you're going to have the user crop and scale on the client-side to a few hundred KB first?

Using canvas for this is much more friendly to their bandwidth, no nefarious intent needed.


It also breaks page zoom. The user's preferred zoom level for a domain isn't preserved between new-tab page loads, but resets itself every time.

(I'm guessing it was too much implementation work to separate out this feature: to preserve normal, expected UI behavior client-side, while presenting a fake pagezoom value to scripts. That would degrade only a handful of (poorly-designed, script-layout) websites, rather than the whole accessible browser experience).


Yeah I enabled the option yesterday after learning, today I disabled it back since NOPE without site-specific zoom settings retained the web is too inconsistent for me.


I just tried putting it on with the idea of trying it out for one workday to see if it breaks something. It immediataly broke favicons on my GitLab tabs (turning them into random vertical stripes of pixels), which is both odd and a pretty bad start.

I really like the idea behind this feature, but it seems the Web API might have become too complex to counteract bad actors like this. It's particularly scary that it can correlate your activity in private mode with your identity in normal mode.


RFP randomizes Canvas data extraction by default, which might have something to do with it. Gitlab favicon seems normal to me when I navigate there(RFP on).


I've been using it for years. I've barely noticed.


It also tells websites that you want a light color scheme (instead of not indicating any preference).


For the photo problem you can give explicit permission for the website to use Canvas and then reupload the photo. It's annoying, but oh well.


This is FUD. As others have said, been using RFP for years and barely noticed.


I'm not saying don't use it, I also have it turned on. I'm saying that it has consequences, and you might not immediately realize it's related to RFP.


There certainly are consequences. However, you said it "breaks a lot of the web" including "the back button".

Maybe this is the case with some very complicated SPA type sites, but personally, I've never seen this.


Yes, all the problems are on very complicated SPA type sites. You know, like Google Docs/Drive, YouTube, Facebook, Instagram.


Another method for web fingerprinting is called GPU-Fingerprinting [0], codenamed 'DrawnApart', it relies on WebGL to count the number and speed of the execution units in the GPU, measure the time needed to complete vertex renders, handle stall functions, and more stuff..

_______________________

0. https://www.bleepingcomputer.com/news/security/researchers-u...


As the years pass, I keep thinking back and realize that Richard Stallman was right all along:

> For personal reasons, I do not browse the web from my computer. (I also have not net connection much of the time.) To look at page I send mail to a demon which runs wget and mails the page back to me. It is very efficient use of my time, but it is slow in real time.


I think Stallman just shot himself in the foot by even revealing that much. Unless a lot of people do the same thing, it's very easy to conclude that it was Richard Stallman who sent that WGET request, granted a few variables. The difficult part is perhaps tracking it back to its actual source, but I don't think Stallman is that hard to find. All this is of course extremely chilling. I'm sure a profile could be built up around WGET requests, and then employing some "likelihood machine" on it, to make educated guesses as to how likely it is that the WGET request was actually from Richard Stallman. I think we've just stumbled upon a new and "fun" Where's Wally game here!


I actually did exactly that a while ago. Where I worked, we didn't have internet access but we had email access, so as a workaround, I made an email server on my home machine that fetched web pages for me. A coworker took it even further and made a proxy server that automated the process so you could actually browse the web, although very slowly. Just to say that Stallman is not the only one with this idea.

It was in the early 2000, and smartphones weren't a thing. It also was a time where companies were paranoid into letting employees access the internet, but at the same time had abysmal security. By that I mean viruses ran free on shared folders, undetected because their antivirus software was years outdated. Very different times...


In the early 2000's I was working as an analyst for a VFX studio, and I had a meeting with the CFO first thing in the morning. At some point we needed to look at something on his computer, and he responds "we'll have to wait about 15 more minutes." "Why?" I ask, and he shows me every day when he turns his computer on the browser starts producing "pop under windows" at a rate of about 10 pr second, and that lasts for about 40 minutes. Shocked, I debate with him for a bit and he says they've tried every antivirus. I shake my head. When they stop popping windows he has a program that mass closes all of them and then he goes to work, using that same computer - and for studio finances. Blew my mind.


In Germany there is a "WhatsApp" SIM [1], where you have to pay for normal internet use, but WhatsApp texts are free of charge.

With a technique which you described, you could probably abuse a phone with this SIM as a "free" hot spot with infinite data.

[1] https://www.whatsappsim.de/


A lot of paid Wi-Fi hotspots allow DNS traffic through unmolested, that's a similar loophole


Maybe one could use something like this https://github.com/yarrick/iodine


I'm pretty sure there was something, within the last year or so, on HN front page which used [exploit/protocol/hack] to browse wikipedia over [SMS/tweet/etc.], or similar


> Guthabenaufladung mind. 5 € alle 6 Monate zur Verlängerung des Aktivitätszeitfensters.

So it costs at least 83 cents per month. Still might be worth it compared to the insane mobile data charges here if you can get a usable bandwidth. I suspect in practice they will just ban you if you abuse it like that.


I recall one time in 2015 or 2016 when I had only a very weak 2G signal, but wanted to check a couple of pages (at least one of which was several hundred kilobytes). Connections always timed out in browsers, but I got it working by SSHing into my VPS, downloading the page with curl, then copying that down with scp. My recollection is that the file size would increase by 32KB every 15–30 seconds. Fun times!


Mosh works over 2.7 KBPS capped data plans. I connected to a tilde and just use lynx/links/edbrowse for light www/irc/jabber and gopher. It runs much faster than being connected natively to the inet. I can read everything and even answer in fora with Edbrowse.


> It also was a time where companies were paranoid into letting employees access the internet, but at the same time had abysmal security

In the early 2000s I was working at an insurance company. They used some kind of blocker in their outgoing firewall that prevented access to certain sites. At one point the blocklist included sourceforge, which threw my team's work a wrench because at the time a lot of the packages we depended on were hosted there. It took a few days to get that removed from the blocklist.

This same insurance company shut down for multiple days when a virus, I think it was ILOVEYOU, infested their email system so bad that nobody could work, and everyone (except the poor IT folks) got a long weekend. And then a while later, it happened again, but with a different virus, possibly Nimda. The company was very bad about updating its systems, and even in 2003 most users were stuck on Win95.


Company I work for still blocks SourceForge because... something bad happened 15 years ago (?).


Enterprises.


You gotta be a little more specific.


> It also was a time where companies were paranoid into letting employees access the internet, but at the same time had abysmal security.

I recall we had a crappy firewall that would collapse under the load of NAT for the 100ish employees and so executives got static IPs mapped to their machines. The late 90s and 00s were crazy.


> so executives got static IPs mapped to their machines. The late 90s and 00s were crazy

In my Uni days, all our department's machines had public IPs; no NAT, no firewall(!)

So much simpler to able to telnet, FTP and/or remote desktop straight from home to the office :)


Same at my University in the mid-90s. I was the CS department network admin and we had an entire /24 to use as we liked.

At least it taught me how to detect attempted hacks early because every machine had to be monitored for attacks.

I just looked and they still have a /16 (65k public addresses). This is for a school that has maybe 15k students, not all of them living on the campus. And I’m sure most of the computing takes place in the cloud now anyway.

I know there are a lot of places who were on the Net early besides the military that have excess address capacity.


I did the reverse and had an ssh connect to my home machine and ran an IPv4 tunnel back through it so that I could browse the entire corporate internet from my home network, creating a full VPN essentially. Make me about 10 times as productive as going through the dial-up we had to use while we were oncall.


Stallman shot himself in the foot by having a text only blog that was easily searchable when it came time for the wolves to cancel him. A crappy proprietary blog or thousands of hours of ranting via Youtube videos ironically would have slowed down the haters and maybe even cause them to miss things with which to cancel him with. Its hilarious in an ironic way. Bonus points if the cancelers were running GNU software. :D


You make it sound like he said something mildly insensentive. He was "cancelled" for making pro-cp comments, and for literal decades of being a creep. https://twitter.com/_sagesharp_/status/1173637138413318144


And how does that invalidate his technical expertise?

More importantly, was he charged and convicted with anything?

I am getting really tired of this public opinion tribunal, where mere accusation is enough to get a person out of their position. This is not how this is supposed to work at all.


No you don't get it. He was a "creep" aka should be cancelled wholesale on everything /s

The only questionable thing he did was try to rationalise pedophilia, which he has since changed his mind about. Given that he's clearly _not all there in the head_ (i.e neurodiverse) and assuming he hasn't tried to access child pornography or similar I couldn't care less. All of the other accusations against him are nonsense[0] and all center around unsubstantiated rumors of him being a "creep." People being anti-Stallman is insane to me considering how much he has contributed and advocates for not only free software but also gender equality.

[0] https://stallmansupport.org/debunking-false-accusations-agai...


> Given that he's clearly _not all there in the head_ (i.e neurodiverse)

Neurodiverse people are "not all there in the head"?

gtfo.


That's right, the guy who sits down, takes off his socks and eats his toenails/toe skin in the middle of a Q&A[0] shouldn't be held to some ambiguous standard of how to navigate society.

It's exactly this eggshell stepping that people are expected to adhere to that has people treating him the way they are.

https://www.youtube.com/watch?v=I25UeVXrEHQ&t=110s


Actually, he was "cancelled" by people lying about him supporting Epstein and saying child rape is good, neither of which were even close to being uttered.

The disingenuous nature of this all is why he's back in his foundations again.


WGET can be pretty trivially told to send custom headers.


Try to do that to a site with CF bot protection cranked up... Not happening without a custom build/custom ssl proxy that mimics the SSL fingerprint of Chrome.


CF blocks you hard simply for enabling "do not track" in settings. The discussion about how awful they are needs to be had.


I haven’t seen a custom build of Wget, but for Curl there is curl-impersonate[1].

[1] https://github.com/lwthiker/curl-impersonate


It would be a lot of work to make it mimic a common profile though.


That work was probably done once, years ago. Might need a few string tweaks every few years, which could be automated.


No, because any “RMS” set of headers would only be shared by the small number of nerds who care, fingerprinting us more accurately again.


Just setup a honey pot and use headers from there ;)


Maybe he's setting a false trail and using curl


Maybe the script does:

    wget --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" ...
Nobody will suspect a thing!


Are we losing the point, here? “Does not browse” When interacting with webmail- also does not browse directly, preferring CLI scripts to act as intermediary. Does wget execute .js or .css or execute anything it reads beyond a URL redirect? Is wget a huge attack surface like a browser?


The GNU/Hurd in the IA didn’t already do it?


I'm pretty sure wget has plenty more users in addition to Stallman


I'm sure too! But there are some very important differences with Stallman's use and "my" use. Personally I use WGET all the time to get specific stuff, mainly downloads of binaries say for some UX system I'm setting up. I'm fairly certain that this is the most common use of WGET, so all that can easily be filtered out. This leaves Stallman's use case, and a few other secretive users, whom I'm sure can also be divided into separate categories, that can then be used to further identify each user's uniqueness. I'm not saying that it's easy, but I'm saying that he's got a higher chance of getting "caught" simply by revealing his rather unique use case.


Not to mention it's programmatic use by tools and applications.


There are so many bots sending wgets I dont think its a real issue.


Wget can mask the User Agent and lots of variables.


Hard to watch netflix or YouTube whis way. Considering I have just learned electronics design from YouTube, this is inconvenient.


For Youtube, try invidious or yt-dlp


Also mpv calls yt-dlp automatically. Add a browser extension to launch mpv for links/the current page and the experience is so much better than in-browser video: native controls, video window can be placed anywhere, full power of ffmpeg.

I am continously baffled by how most people just accept media companies controlling the video player you are allowed to use and thereby the UX. Don't let them.


I use DilloNG, an Invidious instance and mpv being called from the page with a right button click.


mpv is perfect for this


A more modern equivalent might be using something like archivebox instead of browsing directly.


& . . . . @

"Mail for you, sir!"


Why is this being fought with technical measures (which are ineffective and cripple the web as a platform) instead of legal consumer law where you can easily fine and punish companies that do the fingerprinting?

EDIT: Note that you can do BOTH - but one without the other is just a game of whack-a-mole.


Because some browser-makers (Firefox at least) believe that the identity of those browsing the web should be protected. Legislators do not believe that. (At least, a majority of legislators do not.)


What kills me is the cookie consent stuff, they should of enforced that Do Not Track is honored, and have fees that make sites ensure compliance or be sued over not honoring DNT which iirc was sent as a HTTP header, it would of actually been a meaningful solve.


What a legislator believes is irrelevant. Only what the lobbyist is paid to believe is relevant.


Would you consider the entire European Union a minority of the legislators? Because that's what GDPR is designed to do, make identifying customers well controlled and expensive whatever the method.

Granted, the enforcement should be stepped up.


> Would you consider the entire European Union a minority of the legislators?

No, I was referring only to my own legislators (in the US, and not specifically California). Many other places in the world are doing better.


They should enforce that Do Not Track is honored. Its the easiest way, and websites dont need silly cookie consent dialogs if set.


DNT is ~useless because it's opt-out, whereas "auxiliary", non-essential tracking is opt-in under GDPR.

Websites don't need cookie consent dialogs if they only use cookies to do things that don't need to be consented to, like providing the service they are offering. Look at Apple's website, they don't have any.


DNT may be opt-out. But it should certainly be treated as "Don't even bother asking for consent to track, because I already told you the answer is no, and you'll be harrassing me by asking."


My argument is current laws did nothing to give teeth to DNT. I'm not worried about what the technological defaults are, but I would argue that without DNT being legitimized, it was dead on arrival. We have had it in browsers for ages, and we've dropped the ball on enforcing it for ages.

My other argument is, if you detect DNT, the cookie consent dialog shouldn't be shown at all.


The EU has about 1/18th of the world's population, so certainly that would be a rather small minority.


A law needs a justification and needs to apply equally to everyone. Writing that about fingerprinting would not be trivial. Some site operators can make a believable argument that they use it in ways that are good for society.


"Some site operators can make a believable argument that they use it in ways that are good for society."

Example please


My bank phoned me last summer. I'd authenticated with my usual two factors but a new browser fingerprint, then transferred a large sum to a new recipient. The bank blocked the transfers I did thay day, then phoned me to check whether I'd been phished, suffered a keylogger attack or something.


So you were inconvenienced as a result of a false positive derived from tracking. Hardly a great argument.


You can make anything seem poor if you mention the negative effect and not the positive one.


Credit Card Fraud, Spam, etc


Even if this were the case - which I don’t actually believe, but… - it would be straightforward for that law to also constrain these purposes and prevent data sharing with non-worthy operations. At present it’s basically a free for all.


That is literally what GDPR is. Somehow it got reduced to cookie banners in HN psyche, but the whole idea of GDPR is to make sure that the data can be collected and used for well defined purposes that are either necessary to provide a service (preventing CC fraud would qualify), or are explicitly consented to.


The problem with the consent part is that you basically can’t take part in the modern world if you don’t. The theoretical possibility of opting out is undermined bu deliberately bad ux


Not really, there are plenty of plugins that dismiss the popups automatically without consenting to marketing. But generally we should press our legislators to require a universal interface that then can be automated, not try to win a cat-and-mouse game against multibillion conglomerates whose income depends on winning it.


I think the misunderstandings about the GDPR (even many smart people don't get it) prove that designing and writing such a law is difficult and the result has to be complex.

IMO the GDPR is good. But… it is poorly understood by many affected people . IMO if a law is poorly understood by the people it affects, then one should assume the law to be at fault, not the people. IMO it's good but I'm not happy.

4×IMO! Wow.


> IMO the GDPR is good. But… it is poorly understood by many affected people . IMO if a law is poorly understood by the people it affects, then one should assume the law to be at fault, not the people.

You are assuming that it has to be either of them who is at fault. In reality there are third-parties who have been spewing FUD in order to confuse people about the law.


I do indeed assume good intentions. Doing that is one of the principles by which I live.


The short answer which should be obvious... regulatory doesn't work, legal doesn't currently work.

The burden of proof is on the claimant, and with proper information control you can't ever meet that burden of proof. It becomes an ant versus a gorilla instead of David vs. Goliath.

Tell me, how do you differentiate a simple random alpha-numeric string from another random string that may have been generated as a fingerprint.

Mathematically do you think there's any way to actually prove one way or the other? If not, how would that bias the system if the person is adversarial and lies.

The only way to prevent this is to make sure the information is nonsensical.

Preventing collection would identify you in a way that they can prevent access. Even though websites are public, you see this happening with any captcha service.


Can you provide any proof that "regulatory doesn't work"?

Might be my European outlook, but consumer law has been stupidly effective at curbing abuses from companies here and was much more effective than playing the technology race USA is trying to fight. There's always a next side-step, the next abuse a company can invent - and you keep trying to push the responsibility of avoiding it to users (by adding more and more onerous technology) instead of punishing the abusers.


You don't need proof you just need some sound reasoning about the trends. If it were as effective as you claim, progression in this area would have halted full stop.

Ask yourself how long have those consumer laws been in effect. Has this technology problem progressed during that time (increased or decreased). Have the fines against the large tech companies actually been collected and were they sufficient to curb that behavior or are they still being administrated or adjudicated (decades later)? Have the large tech companies provided all of the information they collect for review (including the intermediates they generate from processing for derivation internally, in a way that discloses all the ways they use it), or did they only provide a plausible alternative, or just the base information collected without explanation. Do you have a way to prove its the former and not the latter?

I'm sure consumer law has been effective at eliminating the provable abuses domestically. If they were effective internationally, why would the problem be progressing to ever more complicated ways of ubiquitous tracking (which are against that law), or even domestically for those multinationals.

Its business as usual and these people know centralized power structures suffer structurally from corruption and malign influence, and as a market force they exploit that.

There's enough money in people's futures that no fine will actually solve the issue because fraud gets baked into the process. Privacy, communication, and agency are what largely compose people's future.

Due process from corporate sovereignty guarantees they can draw it out as long as they need to while continuing to make money off their actions, both increasing costs to regulatory (as a resource drain), and increasing revenue.

The real cost is borne on either the individual or on the public, and corporations have incentive to lie in ways that are difficult or impossible to prove. A lie of omission, is a lie.

In my opinion, for certain critical societal protections, its necessary to have a guilty by default, for 'people' whose only possible motive is profit incentive. The corporations or the firm are considered people in most locales, but they only adjust behavior based on profit or future profit (through monopoly).

Placing the burden of proof on the company to prove they are complying, instead of compliant with good faith protections by default, would eliminate most benefits they might receive from deceit, or lying through omission.


Just so we're clear - the consumer law has mostly not been adjusted to cover data mining yet and you seem to be building your argument on the assumption that it has.

Am I correct?


As far as I was aware, it had. Everything I've seen in the last 5 years points to that. Is that not the case?

Granted, I didn't go directly to the regulatory site because who can sit down and analyze multiple legalese documents that have thousands of pages with crossreferencing requirements.


Here's a bunch of consumer laws that work:

- living in the UK, I barely ever receive spam calls or messages. I can be reasonably sure that companies don't sell my contacts to third parties, I can withdraw my consent to marketing communications and spam will stop, I did it multiple times. My American friends seem to have way more problems with that, to the extent of buying burner phones to buy insurance. Considering that the tech is exactly the same across the pond, the difference is entirely in the legislation and consumer protection.

- cars became much cleaner and more efficient over the last three decades thanks to the ever ratcheting Euro standards. I only need an old car passing by to be reminded of that, you can just smell the difference.

- my broadband connection has a minimum average speed guaranteed by law, which protects me from the line being oversubscribed. This actually works, and a friend of mine got a sizeable compensation for a period when they didn't get the full speed.

So consumer laws work, and saying that enforcement can't be done is a bit of a post-hoc rationalisation. It is true that GDPR can and should be enforced harsher, but it's just one example in a long and successful history of consumer protections.


I'll keep in mind points 1 and 3.

As for cars, how do we know that's true. There was Dieselgate, but from what I've heard they only got them because of whistleblowers.

Many VOCs which these laws are designed to reduce are odorless. The ones are visible are larger particle size and generally less of an issue from an environmental perspective from most accounts.


You can literally smell it in the air, older cars don't have cats to burn everything uncombusted down to CO2+H2O. You can smell it with a modern car for the first few minutes while cat is heating up. You can see it in car shapes, there's a reason why every modern car looks the same — aerodynamics and pedestrian safety make car shapes converge. You can see it in ubiquitous cans of AdBlue on petrol stations, which was not a thing just two decades ago (and still aren't in many developing countries).

Finally, you can see it numbers: https://www.asm-autos.co.uk/workspace/images/yearly-co2-emis...

There is no fundamental reason why all those changes had to happen, it wasn't the market driving them. It was the regulation.


> Might be my European outlook

How did the EU cookie laws and GDPR solved this problem? It's as widespread as before, except that now you are annoyed by prompts too.


I think it absolutely does work.

We need better regulation to temper capitalism.


That's very naive, and you need to educate yourself about what capitalism actually is because it certainly isn't what you are saying.

You've misused that term.


No. You're incorrect.

We need limits to prevent capitalism from doing its worst.

It's only fair that we all live and work with the same limits.

This is the type of regulation that is necessary.


Because like the climate crisis, it’s easier to make the individual clean up the mess and make the changes than hold large organisations accountable.


> Because like the climate crisis, it’s easier to blame the individual than clean up the mess and make the changes.

FTFY?


Laws only apply in some countries. The internet is global. Technical measures are faster, more effective, and can be applied in all places.


Laws can be as global as those in power want them to be. See e.g. copyright.


Because bad actors have an easy time on an actually global network. It's disturbingly hard to hold bad actors accountable, particularly if they have zero legal presence (e.g. a corporation's subsidiary) in one's jurisdiction.


Is it really that hard? I haven't seen anyone from US actually attempt any accountability - zero punishments for spam callers, zero punishments for data collectors, not even a semblance of attempt to punish data traffickers?


The thing is, there are so many layers upon layers between the end user and the bad actor that it's hard to pin down blame, and even if one succeeds to identify a bad actor, it's a shell company somewhere overseas and the money is long gone, moved off via a dozen other shell companies - and to make it worse, what may be a crime in the US/EU is perfectly legal in wherever these shell companies are set up.

The solution would be dedicated laws that hold the company at the top directly accountable for the actions of all sub-contractor layers, but these laws are rare and often hotly contested (e.g. with a German law mandating responsibility of the top-layer company for wage theft and other labor law violations [1]).

[1] https://www.ihk.de/regensburg/fachthemen/recht/arbeitsrecht/...


But we’re talking here about major corporations who would (largely) follow the law if there was a law with teeth commensurate with the potential rewards form abuse of privacy.


That law already exists. It's called the GDPR. That's what it's for, and what you’re giving permission for when clicking "accept all".


Look, forget about threat models. It's relatively trivial these days to avoid fingerprinting attacks if you want to (as a private, web browsing individual).

I use fingerprinting actively in enterprise apps as a form of silent 3FA. It's a useful backstop. If I have a user who forgot their password but retrieves it via email, I'll usually let them pass if their fingerprint matches one of their priors; otherwise my software shoots off an email to their immediate superior to make that manager validate that the machine the employee is using is one they can vouch for.

I've always viewed browser fingerprinting as something that can be leveraged as a security feature. It's far more useful for that than for some sort of distributed tracking. I'd never want to live in a world (ahem ... China) where submitting to such fingerprinting actively was mandatory, or politically punishable if you didn't. No society should be run like an employer/employee organization with that sort of lack of trust. No sane free person would allow their own browser to transmit a fingerprint. But for employer/employee systems management? It's a great tool in the box.


"It's relatively trivial these days to avoid fingerprinting attacks". Why should it be on me to avoid them? And more importantly, it's NOT trivial.


really? it takes a minute to set up a VPN and do your web browsing through a virtual machine. I guess it's not "trivial" for the average American, but it definitely is for the average terrorist or child pornographer, so it's easy compared to surmounting most other threat models faced by people intending to evade detection. Therefore, "trivial".

[edit] also, the less trivial it is, the better for corporate security.


The article describes "Fingerprinting as a Service. Some choice quotes:

     It doesn’t matter if you are using a VPN or Private Browsing mode, they can accurately identify you.


    Also note that VPNs does not help with fingerprinting. They only masks IP address.


right. but using a VPN plus a fresh VM running Ubuntu can mostly do the trick. In a pinch, just keep a few different versions of various browsers around when you plan to surf a site that you don't want associated with you. Or change your screen resolution or turn off your fonts.

My point was that fingerprinting is much more practical and useful as a positive form of identity verification than it is as a tracking device, as long as it isn't (and hopefully never will be) mandatory to lock into browsers.


Your point might even be that "fingerprinting is much more practical and useful as a positive form of identity verification" but we all know how fingerprinting tech is and will be used: to track users even more and try sell even more crap to them because that's what almost the entire internet is all about.

And as for this

> using a VPN plus a fresh VM running Ubuntu can mostly do the trick. In a pinch, just keep a few different versions of various browsers around when you plan to surf a site that you don't want associated with you. Or change your screen resolution or turn off your fonts

How do you plan to do all that on your mobile device for example? Fingerpirinting is a problem exactly like invasive tracking is a problem.


mobile devices present a problem when using fingerprinting for 3FA, and require frequent human intervention. This is a good thing.

Fingerprinting is inherently opaque. That's why it's such a good third level security measure. It's a lot harder to spoof and, if someone tries, a lot easier to isolate the attempt.


And you count that as "trivial" for regular user? 90% of users don't know difference between a tab and browser, and you think they would know to setup vpn, vm, and what else to avoid getting tracked.


That's sort of their problem, isn't it? It's not as if those people are reading this and concerned about their privacy.

If they don't care about privacy, they don't deserve it.


one point is that I may not have any specific sites I care about disassociating myself with. I just don’t want an aggregate picture to be built and sold freely.

Cliche example/ I want to be able to buy a pregnancy test online but don’t want that information shared and re marketed to me. There is plenty of stuff like this. The impact of privacy violation is small and often boring but on aggregate corrosive to public discourse and individual wellbeing.


Look... to this and other (sib) posts I have total sympathy, but much better tracking can be done with cookies and other forms of client side storage. Which the 90% of people do not notice, clear, or care about.

Fingerprinting is by definition a lot more imprecise and vague. It's always going to be an issue if surveillance networks use it to pick out individual users. Whining about that is useless. It's also a valuable security tool and part of the landscape. Do with it what you can.


"Oh look it's that one dude with that weird Ubuntu device coming from an AWS IP again."


wellllll.... using your own AWS IP would definitely be dumb


I think you should read more about what fingerprinting actually is.


yeah? I use about 24 different parameters and/or their lack of ability to i.d. a machine. Pretty sure I understand how to turn that into a set of tolerances that can be compared with another machine to provide a reasonable projection of whether those match with the people using them. I think I get the concept.


I'm afraid your view is how the journey to the *"world where submitting to such fingerprinting actively was mandatory" starts. Something with frogs in very warm water.


I upvoted this because it's the only smart comment to my post here. This is the ultimate concern.

That said, fingerprinting is only useful as a third security measure because most people don't understand its mechanics. The mechanics of avoiding being tracked are pretty basic. If our country required browsers or computers to transmit their fingerprint, people would find ways around it and it would stop being useful as a security metric.

Put another way, the moment this becomes a feature of an oppressive regime, it's one of the easiest things to work around. The obscurity is what makes it remain somewhat useful.


(1) Users should not receive passwords via e-mail. (2) How very enterprisey of you to even be able to send passwords, which one also should not be able to do. (3) Users can change or modify their browser, either to another browser entirely or through installation of addons. The fingerprint is not guaranteed at all to stay the same or similar.


This is an uncharitable reading of the comment. "Retrieve via email" can just as well be understood as reset using an email flow, as is common on most websites. And the comment does not claim they rely on fingerprints never changing, they say that if you do have a matching fingerprint, you can use that instead of another procedure.


(1) There is nothing wrong with sending a password via email. Even if you send a reset link instead an email provider could steal that too.

(2) The server gets sent your password every time you log in. You shouldn't rely on a server operator not knowing your password.

(3) You can tune how sensative the system is in response to changes in the fingerprint. Even if their in a failure to match that just means authentication will be extra strict.


(1) Best that would be a one usage link though, so that a user can detect, whether the link was stolen from their inbox. I think you also did not get my point: The service should not know the password at all. Usually not even initial passwords for any account. It is simply a bad practice to ever have knowledge about user passwords, except for a salted hash. So I say you are wrong.

(2) The server gets send the password via the default communication channel, the browser, TLS hopefully, not via e-mail, possibly into an inbox, that is third-party controlled (say some google mail inbox for example).

(3) That does not make it right. Did they even ask the users for their consent? Did they learn anything from GDPR or in general discussions about consent? Or are they just a higher being, allowed to decide for their users, what data about them they track?

Many things can be done using technology. The question is always: Should we do them? That is a question about ethics, not technological possibility. We already have far too many businesses not caring about ethics at all, we do not need any additional ones.


(1) How can the user detect it? The service can request a password reset at any time. Most alerts go through emails which the provider can hide. It's only a bad practice since password reuse exists and people trust services not to exploit that fact.

(2) That is how it usually works.

(3) You can collect the information for security purposes just fine under the GDPR.

Providing a better user experience while maintaining a similar amount of security is a net positive


(1) A tool like for example https://github.com/pglombardo/PasswordPusher self-hosted offers a way for the user to detect, whether their password has been seen before.

(2) Are you missing the point? "That is how it usually works." -- So why then send a password to an e-mail inbox, like I said a location often controlled by third party and often one with no good record of respecting privacy, if you can completely avoid that?

(3) OK, seems like we did not learn about consent. Why don't you ask your users, whether they are OK with it first, instead of assuming and basing on what is legally possible? Is ethics something too far out of reach?

Lastly a word about what you call security: Your so called security is observed often enough to result in inaccessible accounts. "Extra strict" usually means something along the lines of "oh, now I am going to require your phone number, to send you a message on a second channel to make sure" or "solve these captchas for this untrustworthy third party provider and I will trust their word about you having solved it correctly" (again being tracked of course ...) or similar things. Again circumventing consent, because now it becomes an extortion, extracting more personal data, so that the user can access their account. Your so called security makes for a real shitty user experience and punishes the user for ever switchting their browser.

So what does your "extra strict" mode entail? How are you going to be "extra strict", without any extortion? Are you implementing your own captachas by any chance? Or something similar?


You seem to have a conflict of interest here. How can you accept this for employee/employer but at the same time say it's not ok for a person to submit fingerprinting? Employees are also persons.


Because the software is only allowed to be used on company computers and a few personal devices which have to be approved by upper management. It isn't fingerprinting the person or the public. It's checking that the software is running on a known/approved machine.


This technology could be used outside the company's garden one day. The current employer/employee picture could been seen as a miniature of society, e.g. The great firewall is only used for Chinese citizens and is controlled by the Government of the People's Republic of China.

But I probably interpret too far here and it seems that in some industries e.g. secret agencies need to use unconventional methods.


Dunning and Kruger agree.


Using the IP address & user agent alone already gives you nearly 100 % accuracy, so the fact that they can re-identify you when these things stay identical isn't surprising at all. I tested that website as well and if you take care to rotate your IP address their re-identification rate becomes abysmal, especially if you're using a privacy-focused browser and extensions like Privacy Badger / uBlock.


And if we ever migrate to ipv6, the IP alone may be all anyone needs to fingerprint you.


Every device implements privacy extensions which changes the address every 24 hours. It's no longer based on the MAC and hasn't been for a long time.


Exactly. IP address identification is the elephant in the room that the article just briefly mentions. Nearly all websites that want to target adds to you use that. It's just so simple to use, you can't switch it off like you can with cookies, except of course by using a VPN but almost nobody does that.


I often see the narrative on here that consumer VPN providers are almost useless for privacy due to other fingerprinting methods, which I've never really bought.


I just tested at fingerprint.com using mullvad.

Brave browser, no VPN, they recorded one visit, one IP.

Brave browser, no VPN, incognito, they recorded two visits, one IP.

Brave browser, with VPN, incognito, recorded three visits, two IPs.

I'm pretty impressed / surprised. A fresh incognito session, through a VPN, still matched the same fingerprint. Especially surprising since TFA indicates Brave randomizes the fingerprint. I even changed my fingerprint block setting to "strict, may break sites" and it's still recording the same visitor ID from Brave, even with incognito.


Just tried this with VPN + firefox resist fingerprinting. I cleared cookies and session data and reloaded, it recorded 1 visit 1 ip for the first (obviously) and second tries. I did not change my VPN connection between attempts.

Based on this test I'm surprised Brave's fingerprint resister did not work for you. But on firefox the enhanced privacy protection (strict) and the resistfingerprinting option are two different knobs.


I use the usual adblocker UBlock and:

* https://addons.mozilla.org/de/firefox/addon/canvasblocker/

which prevents fingerprinting via Canvas elements, additionally warns you if a site does it. There are more sites out there than you would assume. Some stupid blogs even.

* https://addons.mozilla.org/en-US/firefox/addon/multi-account...

This splits your tabs into different categories, each with their own cookie storage.

The fingerprinting website in the article didn't manage to correlate me visiting the website concurrently from two distinct container tabs.


The fingerprinting website in the article didn't manage to correlate me visiting the website concurrently from two distinct container tabs.

But that's merely because of the canvasblocker (or something else you have), because just separate containers doesn't cut it?


You can try https://www.amiunique.org/fp to get a view of all params can used to track you


It's interesting that they can narrow me down to less than 0.1% with just my language list (en-US,en,fr,ro). My user agent is practically unique as well, since I'm running an unusual configuration. I've never thought of that as a disadvantage when it comes to tracking, hah.


I observed this too, but I cannot really believe it. For me it finds just german on the iphone. I get 0.88% for it. But if all Apples do it the same, I can hardly believe this provides already such selectivity. The problem with such test sites seems to me that only nerds visit them, and therefore the database is small and biased.


I have "prefer English, German as fallback". That alone makes me almost unique as well. Not fully (like your special config :D), but enough that other resist fingerprinting options become meaningless.


They narrowed me down to an order of magnitude less based on just my browser user agent (latest Firefox Android). I'm not sure what that actually means.


I'm guessing it means people don't use Firefox, and people _really_ don't use Firefox android.


Nope their database is 40% Firefox.


Almost no one visits that site so their data set is very small


I like this site for the info on how tracking is done it provides but the data set it generates uniqueness from is really tiny and differs a lot from real world browser makeup.

For instance it claims iOS is 4.63% of users and Safari is 3,42% when all other more complete statistical sources put those numbers at closer to 20%-30%.


Ironically, that site has a cookie banner. This made me confused as to whether I should accept cookies or not.


so in order to stay anonymous, one can clear these parameters, alternatively one can generate different parameters for every HTTP call.


No, any session based protocol (HTTPS) would expect certain characteristics to stay the same between the same session.

If it changed with every call they'd just block you as a bot.


I wish browsers did more to combat this. There should be ways to randomize or normalize every bit of information they try to gather.


It’s a double edged sword you need to walk the edge of. Almost everything they use to fingerprint you has a fully legitimate use case which is why it was added.

The more you do to prevent fingerprinting the more you hobble the web as a platform. A lot of restrictions that got placed on the canvas tag to help prevent fingerprinting for instance really limited its functionality.

In my opinion a workable solution would be to make more of these things opt-in by the end user to high accuracy data for the page.


Most of those APIs should be default closed. Incognito should definitely be default closed.


But it's not just a matter of "open"/"close". It's more like signal/noise. Much of the signal is legit: source IP is needed to deliver response, screen resolution, audio/video codec support, transfer protocol, cache headers are all needed to render the page correctly and as quick as possible.

Unfortunately, much of that signal persists across sessions as well as websites and can therefore be aggregated into a hash that works as a "super cookie". The signal is based on the device, the connection, not so much the HTTP/HTML you're looking at.

The best approach to mitigate is therefore: adding noise: add random gibberish to User-Agent, tunnel IP though VPN/NAT, lie about codecs or screen resolution.

While that degrades user experience, it give no guarantees to actually preventing fingerprinting. So, the good news, if that fingerprinting is hard too, and doesn't work as well as is usually claimed!


Randomization works if you opt in everyone without their consent. If your addon or minority browser randomizes data you're adding a signal.


Yes, but that's a poor signal. If only two users add "enough" noise to their signal, fingerprinting will only be able to proof a user added noise, but not which user did so. For a single site doing the fingerprinting.

Compare that to tracking users across multiple sites for proper signal without randomization.


Yeah but if it's opt-in for privacy concerned users there may well be two users in the world with identical basic metadata (browser version, platform, etc) who have this enabled. And telling you it was one of two users but not which is pretty shite anonymization.

Regardless it's still adding an extra bit of information leaked, so you may as well forge a common value rather than make something new up.


Which then pushes a lot of web use-cases into mobile apps locked to a few corporate platforms that make tracking much much easier. Yes, even iOS.


Which we have regulations to force these platforms to open up.

Asking for users permission should and is slowly becoming the default on phones as well.


Well, we could fingerprint the fingerprint detection code ...


uBlock Origin in default deny of 3p scripts basically achieves this already.


UBlock (and even AdGuard) is not preventing this website from accurately fingerprinting me.


If Javascript is enabled there’s ultimately very little that can be done to prevent fingerprinting. If you don’t want to be fingerprinted then only allowing JS to run on allowlisted websites is the only way to truly be safe


Well, and stuff like the resistFingerprinting=True option in Firefox. As described in the article. You can make your browser to just lie to the JS API.

There is a price, of course. Lying about screen resolution might mess up how the website looks. Lying about which fonts are installed might make the site a bit uglier.


As someone said already, 'resistFingerprinting' option should be configurable per-domain. Then we could have it enabled (randomized) for most of the web and disable it (allow fingerprinting) for payment processors and similar 'trusted' websites.


EDIT: there actually appears to be a hidden per-domain whitelist privacy.resistFingerprinting.exemptedDomains


If you don't pay attention to it you might be surprised how non dynamic your residential internet last mile DHCP assigned IP really is. It's not uncommon to go many months or a year with having it always renew to the same address. That, combined with all the fingerprinting mentioned in the article...


This would be one of the things about IPv6, we'd have lifetime fixed IP addresses (or address ranges at least). Wouldn't we?


There's "Privacy Extension" for that, from https://labs.ripe.net/author/johanna_ullrich/ipv6-addresses-...

> The IPv6 Privacy Extension is defined in RFC 4941. It is a format defining temporary addresses that change in regular time intervals; successive addresses appear unrelated to each other for outsiders and are a means of protection against address correlation. Their regular change is independent from the network prefix; this way, they protect against tracking of movement as well as against temporal correlation.


Thanks, can you answer a couple of questions:

So carriers (ISPs) still would need to do NAT, the RFC didn't seem (I skimmed) explicit?

Isn't the removal of processing traffic a large part of the sell for IPv6.

Also, surely the ISP can sell IP-to-user correlation lists as I assume they do now? They can presumably do it anonymously bit with some other party seeking the other part of the data that allows deobfuscation of users (eg to comply with GDRP)?


The way it works is that the ISP assigns the home user's router a prefix (e.g. 64 bits). Devices on the home network pick a random address within that prefix, and regenerate it periodically, keeping the old address alive for a while too.

Only the router needs an IPv6-to-MAC-address map (it always needed that, this was no different with IPv4). The ISP just has a static route that sends all traffic matching the prefix to the router.

With this you can still easily recognize households by IPv6 prefix, but at least you cannot reliably distinguish devices within that household.


Hmm, I feel like I should have known this! Nice clear explanation.


No ISPs don't need to do NAT but you can if you like. You also don't have to do NAT with IPv4 if you only have one device or get a subnet from your ISP. It's just done because we don't have enough v4 addresses.

They can do subnet to customer correlation. The IPv6 is randomly generated by your device if you use SLAAC. But if your ISP is an adversary you have pretty much lost anyway. If they provide you with a router they can see all devices in your network (MAC and hostname) and they could also map certain devices to certain port ranges and sell that too.


I'm sorry, you've misconstrued the question. The context is the privacy extension to IPv6 under RFC 4941. So, my question was would ISPs need to do NAT in order to provide that extension -- I only skimmed the RFC but there was no other obvious way to me for it to be provided that wouldn't fall to an adversarial ISP because it appears they must do NAT to make that work?

AIUI ISPs provide a fixed prefix to customers. So I'd need to look how SLAAC would work if it uses a random IPv6 address; surely your ISP only has allowance to use a limited set of numbers that are allocated to them by IANA or whoever.


They they don't need NAT that is simply called routing. The ISP sends every packet that is in your assigned /64 to your routers IP address. It's called prefix delegation [0]

Yes they get a /32 by default (at least in RIPE) larger allocations need justification. But there are 2^32 /64 subnets in a /32 so every ISP gets a complete IPv4 internet of /64 they can assign to their customers at will. Your devices assigns itself a random IP address from that /64 network your ISP gave you via prefix delegation.

[0] https://en.wikipedia.org/wiki/Prefix_delegation


DHCP and NAT are perfectly compatible with IPv6.


Yes, but will ISPs provide it? I thought the removal of these was one of the selling points?


Then IPv6 adoption must be frustrated at all costs, in the name of liberty ...of course.


Yes, the IP and the user agent string appended and CRC’d is a good enough fingerprint for basic web analytics, like for identifying returning customers.

The false positive and negative rates are reasonable, and false positives (new customer seen as returning) could be further reduced by browser feature testing.


Indeed. The IP looks like a strong signal in the overall game of fingerprinting.


Apple added iCloud Private Relay for that.


It is interesting that the site can fingerprint individual profiles/dir easily:

For example

chromium-browser --user-data-dir=/tmp/profile_A

chromium-browser --user-data-dir=/tmp/profile_A --incognito

chromium-browser --user-data-dir=/tmp/profile_B

chromium-browser --user-data-dir=/tmp/profile_B --incognito

For each command + its incognito it can detect them as separate profiles.

For ultimate privacy one needs to everytime launch browser with a new profile.


... on a new computer, each time ordered from a different brand and reseller, paid with a unique type of cryptocurrency and delivered each time to a new dead drop in a different country.


I tried live boot of ubuntu. Every time it can detect accurately. Looks like the whole privacy thing is OVER. Unless lawmakers do something - (i.e) not going to happen!

Atleast they can use this to prevent reCaptcha - and make passwords disappear!


Ubuntu has a lot of unique information that is readily accessible.

Machine-ID in /etc being one, but there's various other items that can be used in the same way from d-bus activation, and something like 20 different other places, another large number in snap.


Websites can access machine-id?


I've heard from people I know to be scary skilled in that area that its possible through the d-bus interface. Mind you this was years ago.


I guess they can't unless somebody had a great idea in the speficiation osome web API...


That there are os-level identifiers is I think a different discussion. I wonder why these were cited in context of fingerprint.com discussion.


How does the fingerprinting know the payment method you used to pay for the computer, is that stored somewhere in the operating system? How would they know it was a dead drop also? Genuinely curious.


It's just a precaution for when they eventually breach your OS and dig out the machine's serial number from the BIOS. This will allow them to trace you to the reseller you used. But if you ordered to a dead drop in a random country and paid with a different cryptocurrency network each time THEY gain exactly zero information to profile you.

That, or the game against pervasive web tracking is lost.


This is actually very useful information. I think I know how I’m going to buy my next computer, but I wonder which manufacturers support the drop as a shipping option?


It was a joke lmao


Someone not getting a joke is funnier to you than the joke itself.


I was also joking, which makes this thread even more funny.


delicious.


Do these profiles clear their cookies after request? I assume if the service finds a matching cookie, it will prefer it, or at least use as an extra identifier.


Technically one can create this and launch a new profile everytime. It can still detect the device (there are some failures - if I change the screen resolution/dpi). May be after 3 or 4 times, the server may also detect that a certain ip address is trying the same thing.

TEMP_DIR=$(mktemp -d /tmp/chromium.XXXXXXX) ; /usr/bin/chromium-browser --user-data-dir=$TEMP_DIR

At the end as other say they use hardware information + IP + other stuff. It is a lost battle.


But how could it distinguish different profile directories, if they use the same settings. I would assume profile id, directories, or others should not be exposed through the browser. I am not used to chromium-browser (is this chrome? forgive my incompetence), but I wonder what kind of profile-specific static identifiers despite cookies could leak out the browser?

Maybe these? https://browserleaks.com/webrtc But at least FF in private mode should randomize these IDs on restart.


Is this _only_ figerprinting then? If the profiles are different, do they manage to extract some UID from the profile (which I would assume is a bug in the browser), or do they store data client-side using persistent storage APIs?


Chrome does give access to localStorage/sessionStorage in Incognito and this can be used to communicate between tabs on the same domain, but just like cookies and cache this data is wiped if you close the Incognito instance.

It's certainly a mystery, because you'd expect any capability fingerprinting (some combo of UA, extensions, CPU/GPU specs, IP etc) to give an identical result between profiles, so it does seem there's some per-profile difference. But I can't think of any browser API that exposes something like an ID...


Then, could not we a get a trace of the properties it uploads to the server by analyzing what is executed in the javascript? Sure it has some sort of submit endpoint where it throws the individual values to.


POST https://fpa.fingerprint.com/?ci=js/3.8.10&ii=fingerprintjs-p...

It looks like it is using heavy obfuscation.


Scrolling a bit through the mess it seems, it is for exampling, trying to detect the used ad-blockers.

.... adGuardGerman:[u("LmJhbm5lcml0ZW13ZXJidW5nX2hlYWRfMQ==") ....

I see things hat look like font fingerprinting, CSS, Apple pay detection, ... , msPointerEnabled, ..., webkitResolveLocalFileSystemURL, ... cookie settings... ... used mathematical library (sinus, cosinus, ...) serviceworkers, ...RTCPeerConnection, hardwareConcurrency,

Maybe we could dissect it and analyze the full list?

At some other place, they documented e.g. you can get the light/dark theme information out of the CSS. Doesn't even need JS to do it.


Did you check amiunique.org as well with these?


Even after discovering it is worse than they thought, it remains far worse than the author thinks.

Public knowledge is far behind the actual capabilities in practice.


This is easy to say, but not always true. Can you elaborate about concrete details of the "capabilities in practice"?


I can but frankly I'm pretty happy about the lack of lawsuits in my life currently.


I don't understand the test on this page. It says we should be worried because a fingerprinting website generates the same hash even after you clear your cache and site-data, and even if you go into a private tab. But I'm not overly concerned by this, provided I share that hash with other people.

The worry would be that the hash is unique to me (i.e. a fingerprint), but I don't see the evidence that it is.


Unfortunately the many, many browser capabilities have given adtech enough entropy to actually create globally unique fingerprints. You can lookup yours with an estimation of uniqueness here: https://amiunique.org/fp


The likelihood that you have the same hash as other people is exceedingly small.

So if I fingerprint you on a site which is using my commercial fingerprint service, then I can sell your hash to other places and tell them all about your browsing habits. The more places run my fingerprinting service, the more data I can collect on you.


I understand the principle. I'm saying that the test on this page isn't demonstrating uniqueness, and so isn't demonstrating fingerprinting.

The first time I heard about fingerprinting was with EFF's panoptoclick, which stated how many hashes had been generated from visitors, and how many you shared with them.


I agree with you about uniqueness, but being unique doesn't matter with respect to their claims.

Any educated person with sufficient math knows the mathematical structure of a hash will never be unique. Its a Galois field, or 'finite field' after all.

The core of this issue is the flawed but convincing belief promoted by an entire industry that if the probability is sufficiently low, its unique, and following these axioms if its unique its an individual person (eyeball).

Under that assumption, all you need to do is collect fields of information that are variable, and group them together such that it yields to a sufficiently low threshold, I think currently that threshold is about 1 per million. Its a very clever way to defraud advertisers if you think about it. You create an exaggerated market, and charge for each advertisement view.

In my opinion its just flawed thinking but there are some real fanatics out there that subscribe to this dogmatically.

For example applied probability is used as part of the protocol design when accounting for binary erasure channels in things like cell phones. You shouldn't be able to have communications blocked in only one direction, or altered without it being noticeable, but stingrays may have the ability to do this according to the limited documentation that has been released so far.

Probabilities in general have real problems fundamentally with validity. I think the most common approach today is the Axiomatic approach, or the Frequentist Approach, both have significant limitations and often devolve when self reference is indirectly introduced.

So I guess I'll ask, are you a believer?


It's enough to narrow you down to a specific bucket. E.g. "affluent white young male in his 30s in a specific neighbourhood" and serve you ads and news. Collate with a few other sites (even airline checkouts and boarding pages have tracking), and you have a close enough match.

The worst part of this? Trying to hide from fingerprinting makes your fingerprint more unique


> Trying to hide from fingerprinting makes your fingerprint more unique

Didn't seem so in the experiment in the article.

Sure they'll be able to place you in the bucket "tor user", but is that really more narrow than what you'd get without Tor?


Probably. Tor is not something most people use unless they have something to hide.

No, I don’t care about the one time you downloaded a gentoo iso over tor.


Using Tor might stop them to track you in a unique way, but they can for sure put you in the basket of the 0.1% of Tor users


You can easily do this by looking at the IP address. All exit node IPs are public, you don't need browser fingerprinting for that.


Here is evidence that the hash can be unique, or narrow down a small group of people

https://coveryourtracks.eff.org/


This is a nice website.


Sure, the advertisement graph only showed recall and not precision. Maybe everyone gets the same hash! That'd explain their excellent results.

However, I doubt that's a problem in practice. I'd assume these finger printers know what they're doing. It certainly seems so.

How could one make an experiment collecting lots of these finger prints and determine the false positive rate?


The site they use give you a visits counter. So, if that matches what you did, you didn't clash with any other user's hash.


I dont think this is a proper way to test it.

It matters more how unique your fingerprint is than how consistent or reproducible it is. Just testing if you get the same fingerprint back on your second visit doesn't tell you much if you don't know how many people "share" your fingerprint.

As a silly example, if you gave all users the same fingerprint, it would be very consistent but also useless as a tracking method.


On the demo they have previous visits from your fingerprint listed (so you can get an idea of how common the fingerprint is... at least once enough people have tried the demo). Mine had two visits listed which are not mine when using Safari and private relay. On Firefox there were none (and it tracked in normal and private mode).


On my iPhone, I was surprised to see I had made 20+ visits when in reality that was my first visit to fingerprint.com. It makes me feel better that my fingerprint isn't actually unique!


Can we fingerprint fingerprinting code and block it? At first glance it seems like code accessing all kinds of unrelated high entropy APIs should be something detectable. But then static analysis might be too hard in face of obfuscation so it would have to be done using dynamic analysis which kind of means you let the fingerprinting happen but are now at least aware of it. So how do you prevent the fingerprint from being used? In principle one could maybe mark values from entropy sources as tainted [1] an taint all the variables potentially influenced by those values and prevent them from leaving the browser. Not sure if this would be practical and I am even more skeptical that this could be easily added to existing browsers.

[1] https://en.wikipedia.org/wiki/Taint_checking


The short answer is no you can't block it because that identifies to the site owner that you are blocking it (as a negative match).

You'd have to mask every informational API with a suitable corrupted alternative that is plausible.


This is confusing whether it can be blocked and whether it would be effective. Every time you do something unique, you of course become identifiable. But the idea is of course that you are not the only one blocking these scripts which will only make the entire group identifiable. You could for example try to block those scripts with a widely used ad blocker which would make you not stand out any more than any user of that specific ad blocker. It would probably not be too effective as URIs and file hashes or whatever ad blockers use are relatively easy to change, but in principle you do not have to become uniquely identifiable by blocking fingerprinting scripts if enough people are doing so.


> This is confusing whether it can be blocked and whether it would be effective.

It shouldn't be confusing because its really fairly simple.

The gist is this... so long as determinism as a systems property holds true in a system, you can leak information by the absence of something when compared to another expected thing. This is how inference works in many respects, you have properties and you can deduce or infer from whether those are present additional information that is not necessarily given.

Most gifted problem solvers probably couldn't tell you that is what they are doing because its unconscious often a result of years of observation.

Computers fundamentally require certain system properties such as determinism to do work in the first place, and you can inject non-determinism into those processes in ways that won't break underlying subsystems as long as its within an expected range and that can make it indeterminate. An indeterminate fingerprint is useless.

In the case of outright blocking the code from running, you leak information that you are blocking it by preventing it from running since they expect a range of values back and a null (the response when nothing gets sent back) is itself a value (or state if you want to be technical).

The site then only has to test for this semi-unique case (i.e a null represents a single group of people who are blocking it) and then prevent the site from responding to you. They are the gatekeepers because they control their infrastructure.

Incidentally this is how almost all the Are you human tests work. It collects an intrusive fingerprint, and if its within a range that corresponds to a known user spectrum of values then it allows you to continue to the site.

This is a rough boiled down explanation, it can get quite a bit more abstract and technical when talking about whether determinism is present (i.e. how do you test for it).

Ultimately, if you can understand determinism, you fundamentally understand the limits of computation and how computers work at a barebones level. It also gives you the ability to find whether certain types of problems are impossible (and thus you don't waste your time on them, or unproductive avenues).


I meant you are confusing whether such scripts can be blocked and whether it would be effective to do so. That it can not be blocked would require some fundamental or practical obstruction like having to solve the halting problem or whatnot. You said that it can not be blocked because the act of blocking itself makes you identifiable which defeats the purpose, but that is not true, or only true if only you or a few people are blocking the fingerprinting. When enough people block it, then the act of blocking does not make you identifiable or at least to a much smaller extend than not blocking the scripts.

In principle you could build a browser that - to a good approximation, you will, for example, realistically not be able to prevent timing leaks - does not leak any information about the system it is running on, just isolate it from the host system. With more effort you could let details about the host system leak into the browser but not out into the network. Well, some information has to flow from the host system into the browser and out over the network, for example key strokes and mouse movements, otherwise you could not do much with this browser.

So you could still try to fingerprint users by their choice of words, typing patterns and things like that, maybe made a bit harder by filtering out high frequency timing information. But at least one could no longer simply fingerprint the host system. Which would again be a trade-off, your screen resolution is not only good for a few bits of a fingerprint but can also be legitimately used to serve content at a proper resolution.


The practical obstruction is if you block the browser API access, they test for it, and then block you so you can't view their site, and they do that by using the information that you leaked.

As for the effectiveness of blocking what they are doing, you can block them by giving them bogus but plausible data. There's a range of accepted values for those interfaces. If its within that range, they can't tell the difference, but their entire assumption is its determinate, when it can never be determinate because a hash is a finite field.

Their assumption is based on applied probability which has validity issues in an adversarial environment.

Neither requires solving the halting problem.


EFF has an excellent tool to check your browser's fingerprint uniqueness:-

https://coveryourtracks.eff.org/

I use a lot of browser extensions. Unfortunately, this makes my browser easily identifiable.


When I switched off fingerprinting in this browser, the font size here on Hacker News changed. I suppose it just uses the user agent to set a certain font size, or does Hacker News track based on fingerprinting?


The CSS for HN is very terse, and aside from a mobile-specific set of rules it doesn't really do any variation.

Is it possible you had set the zoom level previously, which the browser remembers between sessions, and turning off the tracking reset the zoom to 100%? Do you have any extensions like Greasemonkey or Stylus for per-site customisation?


You mean `privacy.resistFingerprinting` in Firefox? I guess that disables custom font configurations.


This author doesn’t seem to know what they’re talking about. Just because the service generated the same ID for their Chromium sessions doesn’t mean that applies to all users’ Chromium sessions. Chromium just exposes more of their machine. My guess is that they, writing a computer-technical blog post, have a particularly unique machine. Even having 16Gb of RAM separates you from the masses and might make you unique depending on graphics card etc..

The fingerprinting discussion is relatively new. The first research paper’s author is only 35 or so. (Its title is Cookie Monster.) The discussion is also a little amusing on a site like Hacker News. A perfect example of someone who’s easy to fingerprint is someone who built their own computer (likely to be found on HN). On the opposite end of the spectrum, Safari iPhone users with the same model are impossible to distinguish.

There’s a paper out there where the researchers worked with a public entity’s website to get more accurate fingerprinting data. There are very few unique fingerprints in reality and therefore no reason for any company to track them. This tech probably won’t ever identify users uniquely.

There are actually some positive aspects of fingerprinting. Tor leaves a very obvious fingerprint, and it’s easy for banks to detect its use by criminals.


I wonder if https://jshelter.org helps with that. And if it's not too slow.


Mull (FF fork on Android) and LibreWolf (FF fork on desktop) has privacy.resistFingerprinting = true by default. Highly recommended!

https://f-droid.org/en/packages/us.spotco.fennec_dos/

https://librewolf.net/


Do they remember site zoom levels?


I have a sort of love hate relationhip with this stuff. On the one hand, yes tracking me is bad if I am not aware, but on the other hand I work for a company that uses it's expert knowledge to help consumers purchase the right tools for them. Ideally we would like the end product to reward us for putting them in touch with the right customer that we've used our name to help land. Much like a hairdresser would recommend a certain brand of hairspray, or a mechanic who carries their preferred oil - there is always a need for a middleman 'tell Bob I sent ya!'. Obviously this an exception to a large majority of what tracking is currently in place for, but until we drop the whole 'tracking is bad we should just shut it all down', and start to think of a fair and reasonable way for users to say 'I am ok with company B knowing that I have a relationship with company A' then these increasingly nerfarious tracking efforts will happen.


> until we drop the whole 'tracking is bad we should just shut it all down', and start to think of a fair and reasonable way for users to say 'I am ok with company B knowing that I have a relationship with company A' then these increasingly nerfarious tracking efforts will happen.

Given how companies have completely and utterly ignored the idea of consent banners, I am deeply disinclined to believe that most companies would ever actually be satisfied with a user controlled choice in the matter. Where we actually are is that companies will relentlessly attempt to stalk everyone all the time no matter what, and in face of that the only sane conclusion is that in practice tracking is bad and we should shut it all down.


Actually valuable educational sales middlemen - making an individual aware of a tool that they didn't know existed but which solved a problem they already had - are such an infinitesimally small part of online advertising that those use cases can be completely ignored.

This isn't a situation where one rotten apple spoils the whole bunch, it's more like one good apple inadvertently was dropped into a toxic cesspool of rotten apples.


I'm just guessing here, but I'm fairly sure that they use a model that updates dynamically as the "user" or victim changes his or her web browsing settings, and even when the user tries to hide. It easily sounds like some kind of Bayesian filtering going on, or some sort of Markov Chain or decision tree. That is to say that their model tracks the likelihood that you're the same unique user that reloads the page based on the information it can glean from you.

This makes it exceedingly hard to hide from such a filter, because in communicating with these sites, you are bound to reveal at least some information about yourself. And then the "likelihood-machine" does the rest by connecting the dots, even if you gave them "fewer dots."

It's also quite interesting - or perhaps chilling - to see how fingerprinting through NLP and other language tracking algorithms can also track just about any forum post you do, even if you're using a pseudonym.


Target and the model that found the pregnant girl (bad counter argument here: https://medium.com/@colin.fraser/target-didnt-figure-out-a-t...

There are three options:

1. Prevent/Stop it: This ship sailed long ago. Not to be grim about it but pandoras box got opened.

2. Fight it: Tool up, change your print, your behavior, your place. Build focused VM's that you use per topic. Simply do a WHOLE lot less. In the grand scheme, its a lot of work for low return. Note: there are exceptions.

3. Increase Noise: The whole point of most data collection is to sell more to you. Because most people are sheep, a fairly simple model can be surprisingly accurate (over targeting is an issue). Don't be a sheep, diversify, make more noise in the system, search out side your comfort zone and change it up often.


Regarding the more noise strategy, Mozilla has this fun tool: https://trackthis.link/


I thought about the noise route, but doesn't that make you more unique? Maybe if many users share the noise, but then that makes it easier to identify what's noise and what's not.


Any thing that doesn't impact the signal, or can be separated from the signal does not qualify as noise.

You want to quickly throw targeting systems off your scent (or get them distracted) see how sticky high value sales are for the ad's you see on line. Start looking for a new car, use the word wedding too much (god help you if your a woman) or say vacation 3 times near search engine and watch how quickly your ad experience changes.

This won't work "long term"

As an example: You get an ID as a 24 year old male, who likes his local sports ball team, drinks canned domestic beer... that's a profile that is perfect to sell you a BBQ grill and a subscription to the meat of the month club. Spend an hour or two a week pursuing sewing, the engine is going to get confused! Maybe you share a device with your wife, or she got on it...

This is the sort of noise you create, its not random its "more" and you do it by going off type for a while. Have a friend who is into something you aren't (music, art, and so on) ask them some questions and go spend a week getting more informed on their hobby and have a chat with them. Suddenly the systems will see you as MORE...


Well, I'm glad to report that my efforts to fight fingerprinting have paid off.

I use a text based browser, with no js, no cookies, no css, no external requests past the first html page download, no user agent, no etag, I connect through Tor and I've modified the browser to randomize http headers. And of course, it sometimes happens that I want to see something that is refused to me with that configuration (like, seeing anything behind the big internet killer, aka Cloudflare - thanks archive.org for existing), so I have also a classic browser for the occasional lowering of barrier.

At first, I thought fingerprint.com did identify it, giving me the hZ4W5oQ7pJVIHbW2fBXA id. Then I realized it was giving the same id when using curl with and without Tor. Then I realized, by googling and ddging that id that it's the one reported as well to search engines. So it's not unique and it's basically a "dunno" reply.


Oh, finally I found an element of distinguation for the Iphone:

The zoom settings in the display/brightness section of the iphone seem quite relevant for fingerprint.com algorithm.

Toggling between standard/bigger text toggles the fingerprint value.

This could be because the visible area in the screen size changes, as well as some value of the CSS-fingerprint.


Fingerprint.com gives me different IDs across different tabs, and also in private mode. I guess the privacy setup still works somewhat. The stack I use:

- Firefox, Enhanced Tracking Protection ON

- Multi-Account Containers + Temporary Containers addon

- Privacy Settings addon, most settings private, but referrers enabled

- uBO with lots enabled, Decentraleyes addon


The EFF has tried to get folks to pay attention to this for years. See https://coveryourtracks.eff.org/ aka Panopticlick

And it probably understates the problem these days, missing some of the more recent techniques.


Fingerprinting is one of those things where there's really been a slippery slope we've just slid further and further down it over the last decade; back when I worked at an ad-tech startup (almost 15 years ago) I ran an experiment myself with our data to see if a simple hash of IP, browser agent, and maybe a couple other signals we had in our logs (don't recall) would co-relate with the cookies we already had through cookie matching from other sources. And the answer was: yes, about 95% of the time. Enough to be reliable enough to do basic retargeting without worrying about excessive false matches.

But at the time, it was considered to be a big do not touch -- just don't do this. Not so much for ethical reasons, but for optics in the industry. (I wasn't proposing doing it, was just curious)

In the meantime, though, this seems to have just become standard practice, but way more sophisticated with way higher accuracy, as this article touches on.

What was not acceptable a decade ago is now "ok." Not just by sketchy ad startups, but by major players.

But this whole mess ties back to one of the things that worries me the most about the propagation of LLM type ML out into the general industry. It's only a matter of time before ad targeting takes on an extra dimension of creepiness through this (and I'm sure it's already happening in some aspects, inside Google & Meta.)

In the past, in ad tech & search, etc. people could say things like: "Yes, it's highly targeted. Yes we've co-related an absolutely huge quantity of data to fingerprint you exactly, and retarget you. But it's anonymized. No humans saw your personal data. It's just statistics.". Not saying whether or not this argument has merit or not, just repeating it.

But now, here we are, where "just statistics" is a far more intricate learning model. One which is capable not just of corelating your purchases and browsing activity, but of "understanding" you, and which -- while not an AGI -- is pretty damn smart.

At what point does "a computer scanned your browsing for patterns and recommend this TV set" become ethically the same as "a human read your logs, and would like to talk to you about television sets..."?

Having worked in ad-tech before (and having worked at Google, in ads and other things as well), I do not trust the people in that industry to make the right decisions here.


Seems like a disguised ad for that fingerprinting service. Resist fingerprinting was already set to true in my Firefox. "Worse than I thought" apparently means "I thought there was no fingerprinting but I found out there is fingerprinting."


Try to get a non-unique on iPhone. I’ll admit it’s worse than I thought.


Is anyone trying to tie users to multiple devices, and consequently identify both fingerprints as being from one user? I.e. Let's say I visit HN on both my laptop and on my mobile phone, each will have a very different fingerprint, but not only do I visit the same site on both devices but I am unlikely to do so simultaneously across the two devices, and there are likely to be other factors such as not visiting on either device during sleeping hours, not visiting on either device before some date (i.e. when I got into HN).

Perhaps you could call this something like 'cross-device fingerprint unification', idk.


I think it would have to have some code that tied those two fingerprints together... something like `fingerprint.identifyUser("jefc1111")` which would then store both of those fingerprints against your user id.


Another technique you that could be used would be to look for distinct fingerprints that both have visited the same extremely niche web addresses. Someone is surely doing this already.


We did a demo at CES in 2015 which retargeted users on a secondary device.

The demo delivered an ad-unit on mobile after viewing an ad-unit on TV.


If I have a certain phone model with updates applied. Is there something that distinguishes me from other people with the same phone and browser version other than the IP address?


Literally the comment below yours mentions GPU fingerprinting (0). Regardless of whether you won or lost the silicon lottery, you're different enough to be tracked.

(0) https://www.bleepingcomputer.com/news/security/researchers-u...


I, too, would like to know this and find it odd it wasn't mentioned at all. Most web traffic these days is from mobile devices, not desktops/laptops. And Apple at least seems to try doing a decent job of obfuscating trackable info by default on top of massive numbers of people having the same device (probably not true for Android).


I have been trying Apple devices for an hour now. Nothing I can do gets them to pass the eff or linked tracker sites.

Firefox, VPN, privacy extensions, nothing works.

Apple has work to do.


Do you have the same localization settings? The same timezone? The same browser settings? The same screen orientation? The same model of bluetooth headset? These are all likely factors in a client-side fingerprint.


"Given there are companies selling fingerprinting as a service, if you want to really protect yourself from fingerprinting, you should use Tor Browser or Firefox with resistFingerprinting=true."

Fingerprinting services tries to figure out browsing settings. Since very few people have this feature enabled. You might be easier to fingerprint by enabling it. A metric that historically been used for fingerprinting is the "do not track" feature which is a bit of irony.


How many websites do you need to visit before being unique in the world?

Say I follow AS Monaco football, then look for Lego Castle figurines and finally visit a forum on Alaskan Malamute dogs. The combination of these three websites is pretty close to unique in the world imho.

Surely most people can be uniquely identified after visiting a couple more, unless we change browser and ip-address and GPU and set resistFingerprinting=true and ... and clear cookies after every website we visit.


The idea of the Incognito mode is, that the website should be unable to detect that you are using the Incognito mode.

There is a bug in Chorme, which I reported, but they told me they will not fix it: https://bugs.chromium.org/p/chromium/issues/detail?id=120485...


You are not detecting incognito mode but another attribute that correlates with incognito mode.


On iOS I visited fingerprint.com on safari twice and then opened used Brave with its “Block fingerprinting” setting enabled and it registered it as my third visit! They should label it as “resist” as it’s a lot more honest

And https://www.amiunique.org/ says I’m unique in Brave compared to “nearly” in Safari haha


Same. I did iPhone with VON change, cleared cache, firefox or IOS, blocking extensions, and… it gets me every time.


I think Firefox might actually enable this by default for third party sites, but not 100% what this about:config one does:

  privacy.trackingprotection.fingerprinting.enabled
This would make sense since messing with values for the root frame could cause unwanted side effects, but you're not likely to care if some iframe gets your screen resolution or CPU count wrong.


Strange, `privacy.resistFingerprinting = true` did not solve the issue for me, i'm still fingerprinted by https://fingerprint.com/. Even after clearing all cache and restarting Firefox.

Adding the extensions `Canvasblocker` and `Temporariy Containers` did solve the issue though.


Anyone know if there's been any forks of Chrome that enforce more privacy features? I know Chromium is a thing, but I doubt the builds for Chromium (except when tweaked by some Linux distros) do much like Firefox does.

I only use Chrome to test some things, or to create a completely isolated browser session disconnected from my use of Firefox.


Brave, Iridium, Bromite comes to mind


Iridium sounds like it might be what I want, thanks!


Why do these systems use hash-based fingerprinting? Wouldn't it be "better" to have a "browserspace vector", or "browser embedding"? So that if one fingerprint tactic fails in incognito, you don't completely lose the fingerprint, you just get a slightly different vector?


Is someone here working at apple?

https://niespodd.github.io/webrtc-local-ip-leak/ still? leaks local IP in mobile safari. On browserleaks local ip check fails, giving false feeling of safety.


I remember once in college I'd shared my phone's hotspot to connect a TV to the internet after it had abruptly stopped working. And all of a sudden the ads being shown (on YouTube) switched from the local language to mine, both of which are completely different.


The combination of IP, language preference and available fonts is very potent and not obvious to Americans.


Asking the wider audience here, I have uBlock origin installed on my Chrome browser, while I surf mostly on the incognito mode. I know this is no where close to an optimum setup, hence asking. What setup do you folks use to prevent the best you could from being tracked?


It is even worse than the OP realizes. They should run these EFF tests [1] to see how severe the problem is and that it is practically impossible to combat.

[1] https://coveryourtracks.eff.org/


I tested this myself with librewolf and firefox. Librewolf that is supposed to be hardened and has resistFingerpinting by default couldn't stand a chance. The visitor ID was always the same in Librewolf. In Firefox the visitor ID was always different.


There's a flipside question as well, how many users have the same fingerprint as you?


It depends on your browser. You have a common iPhone with Safari in your local language, many people have the same fingerprint. You configure your iPhone with Chrome (that is a webkit view on iOs) and another language, you are suddenly much more rare. You compile Firefox on your ArchLinux with Nouveau drivers and a 16/10 screen, you are unique in your area.

You can experiment there: https://coveryourtracks.eff.org


the problem is that the alternative, that is native applications, is even worse. Let's face it, some level of identification comes with networking, there are ways to anonymize connections, but none is perfect.

Tracking should be limited with legal means.


As another vote against JS, this website is able to accurately tell at least half of the extensions I have installed on chrome.

https://browserleaks.com/chrome


I just tried to turn on `resistFingerprinting` in Firefox and it meant that my zoom preference for HN got reset every time I opened a new page (I have it set to 120% by default). Anyone know why? Bug?


You don't want to be fingerprinted means default options for everything Firefox can do it on


Ohhh that makes sense! Meaning there's a way for a website to detect my zoom level? I wonder how? Checking calculated font size via JS or something? Ugh


I guess we could hijack our browsers to lie about the parameters that are being collected at the fingerprinting, that would be far more convenient than disable JS etc.

EDIT: Or block the extraction


If you buy two of the same exact model iPhone and boot/config IDENTICALLY, on the same Wi-Fi network, they would have the same fingerprint, right?


Tested fingerprint.com with Vivaldi Mobile and it didn't correlate me across a norm tab and an Incognito one, so it's not fool foolproof...


From my testing, this doesn't seem to work on Safari. But it actually works on Chrome. Another reason for using Safari instead of Chrome.


I find it scary coming back to the fingerprint.js site after years and still being correctly identified and see the exact dates I visited


Interesting, despite me not using a VPN, it has me “identified” in the totally wrong location (in fact, multiple wrong locations within minutes).


Maybe your fingerprint is just common? Location probably comes from geoip.


Actually quite surprised to see that this identifies me on Safari in Incognito, after visiting in regular mode first.


And on Chrome - even counts the number of times you visit in incognito


Damn yea I didn't know it was so easy to do in practice (I just heard about theoretical approaches)


It is sad that Brave + uBlock Origin + DDG Privacy Essentials does not seem to be able to fight this.


Wouldn't the browser and extensions make you more unique?


I tested it on Brave mobile (Android) and I got different fingerprint each time


the demo got my browser totally wrong. it has me showing up in various places around the country and I don't use a VPN. One of the dates I was out of the country and my laptop was at home, turned off


You are using cellular data then. Your exit point when using 3-5G can be a lot of strange places. Not unusual.


“Worlds most accurate”: Source, fingerprintjs. Sounds legit.


I was able to "trick" the fingerprint.com test by opening it first with firefox, then with tor browser. Gave two different visitor IDs. So as suspected, it largely relies on IP address.


A different browser with the same IP also gives you a different hash.


It would be nice to have also the Safari evaluation.


with resist fingerprinting enabled in FF, it resets the zoom level i have set on each individual site, so it's just annoying.


In hindsight it’s clear. Why did we allow the web advertising mega corp to own the browser we use? Huge conflict of interest. There’s no way our privacy was going to survive.


We use web fingerprinting and adjacent methods to crack down on ID sharing for our SaaS that charges (per person). I make no apologies for this practice.


How does that work out for you? This doesn't strike me as a good use of fingerprinting:

- Since you charge per person, what about people that use multiple machines and browsers (with presumably different fingerprints)? - On the other hand, unless two people share the same workstation and computer account, how do you expect to use fingerprints to detect license abuse?


We use other signals as well: time of day, ip address, new cookie logs out old cookie. At the end of day we are dealing in probabilities, but we can definitely find the most aggressive sharers.


What is the use case for these fingerprints when adhering to the GDPR? You can't store them in a DB and use them to target your returning anonymous visitor with products relevant to their last visit. You can't send them to a third party ad service to get more relevant ads. Isn't the whole point of the fingerprint to maintain an pseudonym for your users over some time window? But that requires storing them which would be against the GDPR?


To prevent spam. If someone is spamming your site how do you tell if a request is coming from a legitimate user or if it is coming from the spammer. Fingerprints are how you can tell the two apart.


Agree - cross site tracking of devices without consent is going the way of the dodos. With respect to fraud prevention, being able to analyse device signatures along with identity and behavior on a per-site basis is the only reason we are enable to enjoy what’s left of the ‘open web’


I thouht that was most commonly dealth with with a first party cookie. I.e. show the captcha to anyone who doesn't have the cookie. At least that's how it feels when you browse incognito.


What happens if they solve the captcha? If a captcha service costs $0.02 per 1000 captchas that means they can post. That means it costs $10 to post a spam message every minute for an entire year if they get banned after every post. If you want to annoy some site owner that would be an easy way to do so.


Surely if your website collects data using browser fingerprinting this is covered by GDPR and you have to tell your visitors/ask for permission?

https://www.eff.org/deeplinks/2018/06/gdpr-and-browser-finge...


I believe that, despite their claims, this fingerprinting technique actually DOES violate the GDPR.


They claim to be ‘GDPR and CCPA Compliant: Your compliance officer will love us, too’. However, GDPR defines ‘personal data’ as ‘any information relating to an identified or identifiable natural person’, and this includes ‘an identifier such as an online identifier’. Therefore, browser fingerprinting may also fall under the scope of GDPR.


GDPR doesn't really apply outside of Europe, despite what the EU might claim.


The EU does not claim that it applies outside of Europe, just that the law applies to all your customers/visitors that are within the EU.


IIRC they do try to claim it applies outside of Europe; they say their laws apply to any entity processing data of EU citizens, regardless of where the data or website actually lie.


I think it's well within the rights of the EU to legislate in which way the data of its citizens is processed. If your product or service is accessible to EU citizens, in the EU market, then you need to abide by the laws of the EU. It's no different for physical or virtual products.


Many EU websites carry speech which is illegal in other countries.


> If your product or service is accessible to EU citizens, in the EU market, then you need to abide by the laws of the EU

It's not that simple though.

If I offer a website in the US, I can collect the info of anyone that visit it as long as I am not breaking US law.

If the EU doesn't like that, then they can block my site.

They claim though that I am subject to their law if I harness the data of Europeans.


Yes, this is what they claim. As long as your company has no physical offices in the EU you probably don't have to worry about it. If your company grows bigger, you probably should.


That was my only point, that they claim something which simply is not true.


also, one could just roll it up into a wall of fine print or something, no? who reads these things anyway?


> also, one could just roll it up into a wall of fine print or something, no?

That also violates it. Facebook just lost in court in the first instance trying that.


GDPR requires an opt out available, that is just as easy to opt out of as it is to opt in. Fine print disclaimers are illegal.


What the world needs:

<body onload="javascript.disable()">


GDPR should have been approached at browser level. But there would not have been money to make for those that provide "compliant" banners. I guess the economy needed the stimulus.


Too bad the biggest violator also created the biggest browser.


It's not too late. The EU is breaking Apple's and Google's mobile app store monopolies next year with the Digital Markets Act.

Those same two companies effectively control the browser market. If there's political will in Europe, they can be forced to implement working privacy controls.


> GDPR should have been approached at browser level.

GDPR. isn't. about. browsers.


I totally agree with you! .. and second: the website navigation would be smoother without those banners!


GDPR isn't about cookies, browsers or the web.

> But there would not have been money to make for those that provide "compliant" banners.

Are you serious? Do you think WordPress addon makers lobbied GDPR through the European parlament?


Note also: As the number of APIs increases, so does the fingerprinting. E.g. MIDI device enumeration (no prompt in Chrome, prompt in FF, not implemented in Safari): https://twitter.com/denschub/status/1582730985778556931?s=20


We need 2 classes of web. One for document based that doesn't require JS to run (secure). Insecure, all the SPA and anything that require JS to see the full content.


The dark web is the document-based web. Sites built for Tor Browser have to assume JavaScript is disabled. So they have to rely on server-side rendering, old-school HTML forms, HTML meta refresh, etc.

Surprisingly, one thing that seems to work just fine in this environment is (even modern versions of) phpBB. Lot of phpBB dark web forums.

Also surprisingly, this doesn’t preclude polish or some level of app-like stateful interactivity, because CSS still works. You just have to think differently about how you use it.


Back in the day, we had a nice boundary between the document and the "app". Then for some reason we decided that Flash doesn't need to be a thing any more and erased that boundary by building the app functionality into browsers themselves, making the app and the document inseparable. We should have invested that effort into building an open source Flash player instead.

One of the nicest things about Flash was that you could set your browser to only load and run Flash content after you click it.


Java Applets were worse though, every time I got a virus of any sort from merely browsing generic sites, it always happened due to Java in the browser. I finally stopped installing Java for the web and my security problems went away.

Flash had some security nightmares all the time too if I remember correctly but I dont think it ever screwed me over like Java did.

I think unless we lock down new APIs that aide in fingerprinting to only be accessible to WebAssembly and let people block or enable WASM theres not too much else we can do. It would be nice to be able to block web APIs selectively to limit what a JS script can do.


> Flash had some security nightmares all the time too if I remember correctly but I dont think it ever screwed me over like Java did.

Those incessant RCEs were only due to the sloppy way the Adobe Flash player was written. There is nothing bad security-wise inherent to the SWF format itself.

Ruffle is an open source Flash player in Rust, currently under active development. I'm sure it won't have such problems because 1) it's open-source and 2) it's in Rust, and I was told that anything written in Rust can't possibly have any memory-related vulnerabilities; we'll wait and see if this would still hold true if/when they implement JIT compilation for AS3.


> I think unless we lock down new APIs that aide in fingerprinting to only be accessible to WebAssembly and let people block or enable WASM theres not too much else we can do.

IMO, it should be enough if incognito mode presents an identical fingerprint on everyone's browser.


It's not that easy to "present a fingerprint" without compromising the user experience. Sure, you could remove all those PWA and pretend-OS APIs and hardly anyone would notice, but what about things like viewport size and font rendering? You can't exactly hide them from a website.


> what about things like viewport size and font rendering? You can't exactly hide them from a website.

Of course you can. Viewport? Just return fake viewport data containing the most statistically common display properties. Website renders incorrectly? They only have themselves to blame, shouldn't have abused that data for hostile purposes. Data is a privilege, we can and should take it away. Fonts? Just force everything to use Noto Sans or Noto Mono. Everything will render correctly. Maybe the designer's vision won't be fully realized but that's not a problem.


> It's not that easy to "present a fingerprint" without compromising the user experience.

And that's exactly what I'm talking about.

> what about things like viewport size and font rendering?

Not much can be done about viewport size, but a browser could easily ship with 2 fonts (one serif and one sans serif) and only allow access to those.


“Font rendering” is a different thing than “what fonts you have.” Font rendering is about how fonts are drawn to the screen. The trick is to draw some words to a <canvas> and then pixel-peep the result. Different OSes and browsers use different font renderers and font hinting logic; fonts will even render differently on a different-DPI screen.


Different browsersare always distinguishable but a single browser could choose to always use the same font rendering code and settings, at least for private browsing.


Not just the <canvas>. The font rendering of the underlying platform also influences the width of strings. So if you create a <span> with some text, its width will differ several pixels depending on the host OS.


Don't they already use freetype to parse webfonts?


Is that a standard — that all browsers are forced to use the Freetype library, or to be bug-for-bug compatible with its glyph+hint parsing semantics? I've never heard of anything like that.

But also, even if they did, AFAIK browsers still mostly lean on OS text-drawing APIs for font rendering. Text in Chrome on Windows looks different than text in Chrome on macOS, etc. The same pile of beziers, and the same pile of hints, converts into a different set of hinted pixels (and sub-pixels!) when fed to each OS text-drawing API. Especially when those APIs are configured by user settings around subpixel hinting / "font smoothing", and when those APIs are aware of the device being rendered to and so render subpixels differently for high-DPI vs low-DPI screens, RGB vs BGR displays, etc.


For viewport, you can limit the size presented to the page to a few sizes with different aspect ratios. Browsers can simply rescale the page to the actual window size for display on the screen. That also works for font rendering.

If users decide they want pixel-perfect display, they can either resize the window to one of the allowed sizes or disable this feature for a specific page.


You are describing the "compromising the user experience" alternative.


User and javascript can simply see different things, like adnauseam does. User experience is already shit.


I think we will end up with something like permissions grants (including granular JS APIs available for the website, as we do for the location, camera APIs etc, at the moment) per website and convenient tools built-in browser that allow you create/re-use patterns so you don't actually interrupted by this strictness too much.


> per website and convenient tools built-in browser

Per-website, for dozens (if not hundreds) of APIs and convenient? These are contradictory :)


Yeah, sounds a bit overwhelming for users, but my point here us that we would need appropriate tooling to be offered to end users so they are not get lost (quickly lol).


> We need 2 classes of web. One for document based that doesn't require JS to run (secure).

I've wondered for a long time if a sort of posh gopher based on markdown with extensions would be able to make a comeback. Especially if it allowed for CSS.


But not current web CSS, please. It manages to be simultaneously overcomplicated (including enabling fingerprinting) and really bad at laying out text (e.g. still no baseline grid). A ‘markdown web’ style sheet should be more like document processor's character/paragraph styles. It also needs to be easily overridden for accessibility or alternate presentation, particularly around size and colour (a markdown-like format should already be fine for screen readers without styling). Aside: FFS, web people, if you're setting colours at all, respect @prefers-color-scheme and do not use the inverse for code blocks.

There's also the million-markdowns problem, and markdown's HTML embedding. This being Tuesday, I'd start with djot (without embedding), but Wednesday I might go for asciidoc.


You're sort of describing Gemini.

https://en.wikipedia.org/wiki/Gemini_(protocol)


Except that is too limited even for a document web.


As with most things, this isn't really a technical challenge, it's a social one. The protocols you're describing already exist, more or less. No one uses them.


Why not just good old web 1.0 or even HTML5 without javascript. There are alreay plenty of pages that conform to that, you just need the client enforcement (also already available via extensions) and marketing/lobbying so that big organizations switch to it.


Yeah that's never going to happen - javascript is a lost cause. There's no way any sort of conflict and backwards compitability will lose out to "a bit more privacy" especially when people in control benefit immensely from this.


Which class you're in will be in control of the developer and they'll always choose SPA even for the presentation of static text.

And to be fair it makes a lot of sense because writing HTML templates feels super jank once you've experienced not doing it. Even for a site with static content I would still prefer to deliver it as a static JS bundle and a data payload.

I really like https://docsify.js.org. Gotta be one of the lowest touch libs out there. The whole site from git repo to page one single completely static asset.


MIDI device enumeration is behind a permissions prompt, though? "The user must explicitly grant permission to use the API though a user-agent specific mechanism, or have previously granted permission." https://developer.mozilla.org/en-US/docs/Web/API/Navigator/r...

EDIT: nope, not as implemented in Chrome https://www.jefftk.com/test/webmidi


At the time of the tweet not in Chrome: https://twitter.com/denschub/status/1582730988118867968?s=20


And the tweet is correct, unfortunately: https://www.jefftk.com/test/webmidi

Looks like Chrome is trying to change this, and is slow as usual: https://groups.google.com/a/chromium.org/g/blink-api-owners-...


> is slow as usual:

It's funny because for anything Chrome deems beneficial to Google they are anything but slow, including shipping APIs that no other browser agreed on.


Having been someone at Google working on new browser APIs, that's slow too. But maybe it doesn't look as slow from the outside?


> But maybe it doesn't look as slow from the outside?

Google ships 400 new APIs per year. It readily ships API within a month after it spits out a half-prepared spec and asks other browsers for input.

Even benign changes like CSS headline balancing was sent to TAG three weeks ago, and will ship a month from now.

From the outside this is neck-breaking speed with utter disregard for anything. But when user privacy is concerned? Nah, must spend sweet time to do anything.


> Even benign changes like CSS headline balancing was sent to TAG three weeks ago

The "text-wrap: balance" proposal is not new, though? I see it in the 2019-11-13 draft spec: https://www.w3.org/TR/2019/WD-css-text-4-20191113/#valdef-te...


Perhaps, but it was sent for TAG review only three weeks ago, and already with an intent to ship in a month.


Isn’t that the point of the RFC approach to standardization? “Here’s what we think should be done, with the PoC being what we’re already doing ourselves in production; feel free to try our impl out, in order to better notice the design flaws in practice, so that we can talk out what changes could be made before it becomes a de-facto standard we’re all stuck with”? SPDY → HTTP2 was a great example of Google doing exactly this.

The opposite of the RFC approach is the “airy design document written by standards body in reference to nothing, never implemented by anyone” approach; and I know which of the two I prefer.


> Here’s what we think should be done, with the PoC being what we’re already doing ourselves in production

The problem with having anything in "production" on the web is that you can neither update it or change it because people will rely on it.

The idea behind web standards is that there should be at least two independent implementations, tested behind a flag, with iterations on design, before it becomes a full standard.

Chrome's approach for the past several years has been: spit out a half-completed spec, "ask" other browser for input.... and ship it in prod a month later.


It looks like that was just because it was very uncontroversial, though? https://github.com/w3ctag/design-reviews/issues/822


Didn't Firefox also require the user to install an extension before enabling MIDI support?

Edit: I think MDN confirms this, with the asterisk next to Firefox: https://developer.mozilla.org/en-US/docs/Web/API/Web_MIDI_AP...

Edit 2: oh, the tweet shows two prompts, one of them to install the extension, so I suppose that is actually the prompt you're referring to.


Is there a firecracker VM or something similar that comes preconfigured with a browser and VNC/RDP that can be used like a native browser but is running in a VM that's not fingerprintable?


For anyone who this is news to: This is why I always call the "I don't care about cookies" extension an adtech submarine, because it deceives you into thinking it’s all about cookies, when the permission you give automatically in many cases are about tracking, so using that extension will often have you consent that fingerprinting you and creating a profile based on that is perfectly fine.


To me, the thing is that I can't count on the consent modals to actually do anything. Am I really going to invest time into checking their word? How would I even do that? That's on top of all the time wasted moving sliders or hiting "reject all".

For me, the cookie consent modals are the submarines. Why would I outsource the responsibility not to track me to the people with the incentive to track me? IDCAC, Cookie Autodelete, and strict tracking protection feels like the better alternative for me.

(From today onwards, I'll add resistFingerprinting=true to that list as well.)


Obviously this only is relevant for people who think companies care a bit about trying not to flaunt the law, I thought that was a given.

There are also proper consent blockers [0], but they are not as big because everyone tells people to use that please track me shit.

[0]: https://github.com/cavi-au/Consent-O-Matic


Implying they actually stop tracking when you press "Reject"


They may not, but if you're in/from the EU and press "Reject" and they still track you, they're breaking the law.


"break the law" means nothing when you're a large corporation that makes more money from breaking it than you spend on fines.

Tech companies routinely get fined for what may seem like massive amounts to us.

https://www.businessinsider.com/the-7-biggest-fines-the-eu-h...

If "breaking the law" meant something they would try to avoid doing it so often.

Microsoft in the 90s was recognized guilty of abusing their monopoly. What were the consequences? nothing. This sort of thing used to mean something, see: Standard Oil v United States. But the current world is a world that belongs to megacorporations.

Tracking people against their will is a drop in the ocean of what corporations get away with.

This, in many ways, is like a billionaire getting a ticket that doesn't amount to more than the hundreds of dollars for bad parking. The billionaire doesn't care.

Law only has meaning when the punishment is coercive.


[dead]


Your customers perhaps, but google and facebook definitely use it for tracking.


[flagged]


Just because you can doesn't mean you should. Worst ethics ever. I hope you go broke.


I disagree. I hope the guy becomes wildly successful, so a leak of his methods get in the news here so we know how to protect against that as well.

What you suggest is to put our heads in the sand instead. No, no and no. I prefer to be exposed to the worst so we learn how to protect ourselves. That's why this is Hacker News and not PutOurHeadInTheSand News.


The main use case that we're tackling is financial fraud, scams, account takeover and more. - Over $32billion is stolen yearly online due to financial fraud, and browser fingerprinting has proven to be one of the most reliable way to combat sophisticated fraudsters


Next you tell me this thing will be saving us from child porn or even terrorism.


Wow, that really is a shameless plug.


For a crappy product as well, double shameless..!


What makes you think that this is a worthwhile addition to the world?


We're focused on serving ethical use cases such as combating fraud, account takeover, scams and more.


How many clients have you refused to work with for ethical reasons after they offered you money?


But if I was a fraudster you'd still take my money, right?


"That's how web works."

Nah. I make an HTTP request and I get a response. That's how the web works. Perhaps people can have different opinions on "how the web works".

Web fingerprinting relies on a heap of assumptions. For example, that someone uses a web browser to make HTTP requests, that the web browser sends certain HTTP headers in a certain order, that the web browser runs Javascript, that it processes cookies, recognises HSTS response headers, and so on and so on.

If all the assumptions are true, maybe web fingerprinting is effective. But if the assumptions fail, maybe web fingerprinting does not work so well.

I have only ever read blog posts about web fingerprinting that take all the assumptions as true.

The majority of traffic on the internet is said to be "bots". Not web browsers running Javascript, processing cookies, and so on.

It seems to me that someone should discuss what happens when the assumptions fail.

Do advertisers care about computer users who do not use graphical browsers much. As such a user, IME, the answer is no.

(Interesting to see how defensive replies get. It's obvious the "tech" crowd intent to spy on web users is heavily reliant on certain assumptions to remain true forever. It shows that there is necessary pressure to keep web users using a "preferred" web browser and web ""features" that will subject them to "web fingerprinting". Perhaps the assumptions will always be true, conditions will never change, in the same way that interest rates could never change.)


> "bots". Not web browsers running Javascript, processing cookies, and so on

even the simplest bots nowadays can run Javascript and process cookies. What's much harder for a bot (or some other actor that has been doing shady things across many websites) to uniquely fake are things like the graphics card (WebGL Vendor & Renderer), audio and other hardware, which gets queried during fingerprinting.

Full fingerprinting is relatively expensive, so it originally was used by fintechs to combat fraudulent/automated signups, but with the third-party cookie situation it might be already economical to track regular users for ads/retargeting.


Let's take the CommonCrawl bot, "CCbot", as an example. There are no images, CSS, or JS files in the CommonCrawl archive. Is the CCbot running Javascript. Is it equivalent to a graphical web browser with all the same features.

GPT-3 was trained on a filtered version of CommonCrawl.

IMO, this is text-only web use. No (fingerprint-friendly) graphical web browser needed. Others may have different opinions. Perhaps I am biased as I use the web this way seven days a week.


>Do advertisers care about computer users who do not use graphical browsers much. As such a user, IME, the answer is no.

Almost nobody does this, so obviously not. You're probably in a group that makes up less than 0.0001% of web users. And that might even be generous.


I think you can still be fingerprinted without cookies or Javascript (e.g. with HSTS supercookies). It's obviously not as effective.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: