This service claims to not track personal data, yet their docs admit to storing hash(siteID + User-Agent + IP) + seen_paths on their backend for session tracking.[1]
Sites can track sessions without tracking personal data.
right below that the docs also say that this hash is not persisted, only cached in memory and mapped to a UUIDv4. The UUIDv4 is what persists between sessions.
> The IP address and User-Agent are never stored to the database or disk, and there is no conceivable way to trace the random UUID back to this.
>
> It’s only stored in memory, which is needed anyway for basic networking to work.
I can't say whether that is GPDR compliant but it's definitely not storing the hash
Fetch an empty resource that is privately cacheable, set to max-age=0, and has an ETag containing the current timestamp and a random session id. The browser will consider its cached copy always stale.
When you next fetch that resource, because it is stale, the browser will revalidate it by passing an If-None-Match header containing the ETag. Update the ETag to include the original timestamp and the current timestamp.
So on every page load (or whichever other event you want to measure), you will be told when that session started, the session id and when that visitor was last seen.
To set the maximum session duration, reset the ETag if the last seen timestamp passed to you in If-None-Match is too long ago.
This can even work without JavaScript by using an img element.
The only data tracked with this is the session start time, last seen time, and a random session id. Since the session id isn’t related to any of your business logic, it cannot be used to identify an individual.
To further isolate this data, locate the tracking resource on a different hostname. The browser’s SOP will prevent any cookies from being sent with the request, so your analytics backend can’t record identifying information even if it wanted to. This will also prevent you from tracking which page is being visited, though you can override that with the no-referrer-when-downgrade referrer policy.
You just reinvented analytics cookies. You’d be surprised, but they don’t store PII either. It’s usually just a randomized session ID and timestamps, like you’re suggesting.
„ In comparison, in the context of the European GDPR, the Article 29 Working Party[6] considered hashing to be a technique for pseudonymization that “reduces the linkability of a dataset with the original identity of a data subject” and thus “is a useful security measure,” but is “not a method of anonymisation.”[7] In other words, from the perspective of the Article 29 Working Party, while hashing might be a useful security technique, it is not sufficient to convert personal data into deidentified data.“
Indeed you are correct. Plausible it is not. They should put their cookie consent back up, and need to inform their users how they are indeed processing the data collected from personal users.
problem is that this is what they say they do, there are too many examples of companies being noncompliant to their own policies and regulations. they should explain the abovementioned algorithm in their data privacy declaration published online. also even a hash can be considered as a private and personal data unless it has been protected sufficiently. thus need to inform your users anyway.
Good approach. IP Addresses are personal data. So the data and the hash is subject to GDPR.
You still need consent to collect it - well or some other kind of legal shenanigans. The intent is to track a person, it is not technically necessary. You might have a legitimate interest - but in the end you still have to consider the GDPR to use this tool.
Turns out that many officials believe this is fine. Companies using Plausible, Matomo and similar services have been under scrutiny.
IP adress is required for site to function - your server cant not collect it. Plausible also only processes it for uniqueness and doesnt save it as is. Interestingly most webservers/firewalls will have to keep track of ip adresses so they will be saved in acess logs and caches. Making them more problematic than Plausible. Yet its most likely fine because the intent is not to track individual users but to improve service/keep it runing. Plausible intent is also not track individual users but collect visitor counts which is something used for improving service too.
I have experience from state funded projects from central european countries. Afaik what they battle/hate most is what goes against the spirit of the law. So mainly popups that are hyperdesigned to be confusing so people are forced or tricked or annoyed thus accepting everything.
Another thing they battle is how long data is saved and where the data is shared.
If you self host service like plausible or matomo that do everything thats possible to be compliant then it's fine.
I think there is marketing tactic ad/analytics companies and marketers use against services like Plausible. They say these services also require cookie popup and wont give you as much detailed info so why would you use them. Most websites would be fine with limited data Plausible provides but it breaks ad/analytics industry business plan.
> Plausible also only processes it for uniqueness and doesnt save it as is
That's exactly the point. Processing of personal data to identify a unique person.
Regarding firewalls and logs: It's argued that this is legitimate interest as it is stated in Recital 49 of the GDPR. So they got a free pass, for the better or worth.
> I think you might be permanently spreading fear
Don't get me wrong, I like the approach. But it's not a get out of GDPR free card.
> That's exactly the point. Processing of personal data to identify a unique person.
Not sure thats what i said. They cannot identify unique person. They identify unique legitimate visits per one day.
If logs and firewalls mean legitimate interest because you have to give server your ip address for everything to work then using same thing can be said about plausible especially since the ip address is immediately thrown away unlike with firewalls where the main point is to keep record of bad actors.
It is very different to google analytics where whole point is to pinpoint repeating visitors, their behaviour etc. You simply can't do that with service like plausible. What you can do is know how many legitimate visits you had and what was visited. For most websites that is enough at same time i would be surprised if not knowing how many people visited your site would not be legitimate requirement for service to function.
Legitimate interest still requires the data subject to be informed under Art 13. Not sure how that would be accomplished without at least an info banner. (This goes for server logs too.)
If you have a website you have to write this in your Privacy Policy and most do.
Firewalls are a curious case. It is argued that the data is not collected but transmitted to the controller. Almost as if you get a letter with personal data and now have to deal with it.
Yes, it's a stretch. Not happy with it but I don't see any practical solution either...
AFAIK it's not enough to write it in your privacy policy. Art 21 of the GDPR makes this explicit:
> (4) At the latest at the time of the first communication with the data subject, the right referred to in paragraphs 1 and 2 shall be explicitly brought to the attention of the data subject and shall be presented clearly and separately from any other information.
I am not a lawyer, but as far as I can tell, there is no legal way to collect PII (including IP address) or place tracking identifiers on the user's device without at least informing the user explicitly under the GDPR and the ePrivacy Directive.
You are correct. In early days of the GDPR people thought about a page in front of the original page without any data collection presenting only the privacy information.
But soon there was an agreement that Art 13 lit. 4 could be interpreted that as long as you don't have any data collection beyond server logs this would be deemed as sufficient. Or in other words if you won't invoke the Art 21 lit. 1 of the GDPR.
But since everybody wants to track you on basis of their legitimate interest the web became full of cookie banners
That's a bit simplistic. IP addresses are not unequivocally personal data. Let's rewind back a bit, GDPR Art. 4:
> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
IP addresses only allow to identify a natural person when combined with other data, such as ISP data or a profile built over dozens of websites. This is not the same kind of personal data as a name + address, Breyer notwithstanding (note the bit about the ISP in the judgment).
GDPR is not about identifying an abstract entity, it's about identifying a natural person. Doing the former for long enough/with enough data allows the latter, but especially with time-limited in-memory hashes that's a non-existent window of opportunity.
In practice this'd probably need to be resolved in court, and I'm sure not a single SME using Plausible or similar will even get a stern letter, much less fined.
> In practice this'd probably need to be resolved in court, and I'm sure not a single SME using Plausible or similar will even get a stern letter, much less fined.
Agreed.
Plausible just makes false claims like:
> All the site measurement is carried out absolutely anonymously. Cookies are not used and no personal data is collected. There are no persistent identifiers.
That's a heavy statement and it is simply not true, as you quoted:
> an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person
hash(daily_salt + website_domain + ip_address + user_agent) will fall under this definition.
But again, you are right, better then anything any other service does
what are your thought on aggregated data? you can still identify unique visitors but its aggregated data so you can't link it back to the individual.
I have doubts that just identifying unique visitors would also identify individuals. Their current approach of creating random id which is unique for 24 hours should not violate GDPR? or it would?
You begin at a point where you have data to aggregate. This data is linked to individuals.
Anonymisation of data is data processing and some argue, that it is subject to a privacy impact assessment. Arguing that if done poorly it has great negative consequences for the individual if they can be deanonymized.
The duration itself does not change the outcome.
Thus said the approach Plausible takes is much better than any cookie used.
I think you can argue if this holds up: you cannot retrieve the ip from the hash (and residential IPs are usually dynamic). The short lifetime together with never storing the hash makes it so you cannot de-anonymise the user.
No one will get fined for not asking consent for this. Our DPO just said ‘don’t be silly’ when I asked him. But we will see if it gets tested (my bet: it won’t).
You don't need to retrieve the ip to make it PII, the hash itself is PII.
You might not think of it as containing actual "personal information", but its sole purpose is to attempt to uniquely identify a person. That makes it PII.
> (and residential IPs are usually dynamic)
This actually makes the short lifetime more suitable as a PII, because it reduces the likelihood of the same IP being used by a different person being tracked as the same person.
> The short lifetime together with never storing the hash makes it so you cannot de-anonymise the user.
That also doesn't matter, because the lifetime of the token is long enough to track the user through and entire typical session, maybe several.
The stupid thing in all these shenanigans is that collecting the data isn't itself the problem, it's not getting the user's consent. Just tell the user what you're doing, and it's not a problem - if it's a "technically required" cookie they can make an informed choice to use your site or not, if it's an "optionally required" cookie, they can choose whether to accept or not. Most users won't care and will click on the biggest, most obvious buttons. The ones that do care are likely atypical and would skew your metrics anyway.
You can as long as you have IPv4 visitors, because the search space is small enough to brute-force. There are only four billion IP addresses. The user-agent complicates things a little but there aren’t many of those, so you could retrieve the IP addresses of most visitors from the hash if you wanted to.
> residential IPs are usually dynamic
Usually isn’t good enough. I’ve had residential IPs that are on public record belonging to me personally. IP addresses can be personally identifying information, so they need to be treated that way.
I get what you're saying - in that if you know the IP address, then you can often easily discover who the individual is. I'd counter that actually, for most people this isn't the case - for many companies, only the ISP, Google, Apple, Facebook etc know who the real user of an IP is... (incidentally, the people most keen too force analytics on you, but that's another issue).
However, that is all kind of moot. The hash itself is PII, because it can be used to track an individual. PII isn't about the difficulty of determining the specific identity of a user, it's about the difficulty in identifying a specific user. The distinction is subtle, but important.
Take an example - people are using a wireless hotspot somewhere, maybe you own a coffee shop, and over the course of a few weeks, you're alerted to the fact that someone has been accessing some illegal content that could get your business in trouble. You've been careful to comply with the GDPR, and your logs only include time and hostname of the server accessed. On it's own, there is no PII there. But, combine that with say credit card transactions, or video footage and finding out who was in the coffee shop every time this happened. Then boom! Suddenly, your time has become PII. Maybe not uniquely correlated to a single person, but a group of people. With every instance of a correlation to that person and a group of random people, it doesn't take maybe to narrow it down to a specific individual.
This is why, to actually comply with GDPR, you need to only store logs for as short a time as is technically required (legally beyond a month is hard to justify, ideally a few days at most) and then you should aggregate into groups where individuals cannot be isolated. If your aggregations result in groups of people that are too small, you need to change the aggregation groups, or report an empty group. It's totally fine to store data like "on this day, n people went from this page to this page, average linger time blah seconds" if n is 10 or more. If n is 1 or close to it, that data is still identifying.
That part was responding to where you said "Usually isn’t good enough. I’ve had residential IPs that are on public record belonging to me personally. IP addresses can be personally identifying information, so they need to be treated that way."
My point is that whether you can determine the IP address from the hash or not doesn't matter. The hash itself is PII.
You would still have to produce the paperwork for this.
Most websites don't get fined using GA. Plausible is a huge step in the right direction, but their claims are very strong and not backed up by the GDPR if you take a closer look.
Regarding fines: most offices will give you a warning instead of a fine, you adjust your cookie banner and you are good to go
Sites can track sessions without tracking personal data.
1. https://www.goatcounter.com/help/sessions