One of the claims of Umami is that it's GDPR compliant:
> Umami does not collect any personally identifiable information so it is GDPR and CCPA compliant. No cookie notices are needed because Umami does not use cookies.
From auditing the source code, this doesn't seem to be the case. First, it claims it doesn't use cookies, but it clearly uses localStorage to store a "sessionKey"[0].
The other claim, that Umami is GDPR and CCPA compliant because it does not collect any personally identifiable information is only half true. While the data collected isn't PII (because you can't use it on it's own to identify a user), it's still "personal data". This is because the "sessionKey" stored alongside all events is actually a pseudonymous user identifier. It's really just a hash of the user's IP along with a few other properties[1]. Because the data Umami collects, when combined with some other data, can be attributed back to the user, the data is still considered "personal data". That means you're still subject to most of GDPR such as GDPR deletion requests[2].
I am not a lawyer so I cannot say for sure what constitutes PII and what breaches GDPR. I am using the same techniques as Fathom Analytics, Plausible.io and other products. Everything is hashed into a unique session id and none of the actual data like user agent or IP address is actually stored. It is the same data that is found in server log files. In the strictest interpretation of GDPR, I don't think any analytics product can exist.
As for the localStorage, it's just for performance so I don't have to recompute the session hash. The product will work the same without it. But seeing as it is a cause contention I am probably going to remove it.
Both Fathom and plausible generate a unique salt every day. By getting rid of the old salts, they've anonymized any data older than a day. From [0]:
> We do not attempt to generate a device-persistent identifier because they are considered personal data under GDPR.
> Instead, we generate a daily changing identifier using the visitor’s IP address and User Agent. To anonymize these datapoints, we run them through a hash function with a rotating salt.
I will probably implement the daily salt and remove the localStorage code as well just to be safe.
But again, I'm not a lawyer here, where do you draw the line? Why not hourly salts? 5 minute salts? What is considered a reasonable effort? At some point you're storing data that can identify a user for the purpose of analytics. Still, I'm going try to lean to the safer side as best I can.
Option 1: Accept that you're collecting Personal Data, and satisfy the obligations GDPR places on that. This means disclosing the use of analytics in your privacy policy (what data's being collected & why), listing retention periods, and figuring out how to satisfy requests like Access or Deletion (which may include "we can't identify you in the data we previously collected).
Option 2 is to "comply" with GDPR by finding a loophole that it technically doesn't count.
The Option 2 approach is more common when dealing with American data privacy laws. It doesn't work out so well with GDPR. It's very difficult to not be processing personal data at some point. Even if you fully anonymize your data before doing any non-trivial processing, the anonymization itself is still covered by GDPR. Which means you need to include it your privacy policy and provide opt-out.
It's also high-risk. If a court decides that you didn't quite thread the needle through the loophole in their country and GDPR therefore applies in full, then you haven't done any of the compliance groundwork.
For GDPR compliance, I would be much more inclined to trust a tool that describes how to opt users out of tracking than one that claims they're immune from obligations to opt-out.
As another commenter mentions, the ePrivacy Directive is a whole different kettle of fish. Strong consent needed to read or write any data not strictly necessary to provide the services requested by the user. That law should get updated with more sanity soon... it's been that way for a few years now.
Doesn’t using the website id in the hash mean the key is no longer PII since it can’t follow you between websites? Or is being identifiable within a single site enough the threshold?
Fair point. I was simply following the "common practice" from other products making these claims, which is to not store personal user data and only generate an anonymous ids.
Maybe that's not fully compliant, I don't know, so I went ahead and removed any mention of GDPR from the website. It's not really my goal anyways. I'm just trying to release free software while they are charging money and making these claims.
The IDs that you generate aren't anonymous like Plausible.io. You simply need to address that issue and you should be mostly there for GDPR compliance.
An IP address is considered personally identifiable information in at least Germany. If you're storing that you'll already have to think about the GDPR.
This is just another misguided attempt to adhere to the letter of the law while going against its spirit. Is is misguided because it's based on a wrong understand of what the letter of the law actually is. You see this a lot with adtech and analytics companies who try to skirt regulations through elaborate mechanisms but ultimately in vain.
>This is just another misguided attempt to adhere to the letter of the law while going against its spirit.
It's easy to say this and hard to draw a line between PII and what I can store without consent. "yesterday I sold 5 products on my website" is not PII (I hope). If I store the timestamps for each purchase I'm already in the grey area. One could combine the timestamps with other data to identify my customers.
I've listened to a podcast interview with a lawyer specializing in EU privacy laws and he said that it does not matter if the personal data is hashed or encrypted. It's still personal data. This was about data stored in a database tough, but browser local storage is a database.
This was mentioned when the guest spoke about right to be forgotten. The law is really weird, because you need to delete user's data from your database, but it's OK to keep backups.
> It is the same data that is found in server log files. In the strictest interpretation of GDPR, I don't think any analytics product can exist.
It can exists as long as user agrees to be tracked. There is a category of "metrics" "cookies" user needs to agree on before you can track him for metrics. That's the whole point of the law. You need user's permission.
It’s different because it allows reidentification. It prevents you from coming up with an IP or what have you out of thin air, but you or another party you give it to can effectively use it as a perfect proxy of whatever you hashed.
Let’s take a hashed IP address.There are 4.3B ipv4 addresses. So a few minutes on an old laptop to generate a rainbow table. With decent hardware it would be seconds. The rainbow table could then be used to identify all the IPs you store. If they are salted, then each IP would need to be brute forced, but still only seconds on good hardware
That would still take collaborative data from another dataset outside this product. Compliance would be up to whoever hosted this, and the collaborative data set to comply with the request anyway.
Without correlating data it really isn't "personal" though. You could delete the User account and related without touching this product and you've complied because this data could then never be correlated. Also, if nothing in the activities leaks the user's own identity, then again wouldn't really be personal.
If you don't want to get dragged into a lawsuit when a user gets sued on a GDPR claim, you probably shouldn't make any statements about your product's GDPR compliance. Stick to the facts about how your product works, and leave the legal speculation to the lawyers.
"In the strictest interpretation of GDPR, I don't think any analytics product can exist."
That's the point. Unless you aggregate the data.
Besides, it's not only GDPR you should consider, but also the latest cookie verdict by the CJEU. You need a consent if you drop cookies, session storage or any other tracking technology, no matter if you process personal data or not.
Maybe this might help you, it is roughly 2 hours long but it is as far I am concerned the best explanation of GDPR I have ever seen, done in mostly non legal speech. Actually it is fun to watch (part about borrowing a car is hillarious):
Consent is only one potential basis for processing under GDPR. There are others such as "legitimate interest" which the controller and/or processor may rely on.
Since this is about cookies and IP addresses, GDPR is not the most relevant EU law. Instead, we have to look at the old ePrivacy Directive.
For cookies or any other access to information stored on the user's device, that access must either be strictly necessary for performing the service explicitly requested by the user, or consent is required (ePD Art 5.3). This is where those annoying cookie banners come from. LocalStorage isn't any different and would require the same consent as cookies.
For traffic data such as IP addresses, processing is allowed if it's technically necessary for the “transmission”, if the data has been anonymized, if it's required for billing purposes, or if the user has consented (ePD Art 6). There is an argument that security logs might be necessary, other uses like analytics are more dubious. The good news is that Umami seems to properly anonymize the IP address, so this part seems fine.
In cases where ePD mandates using consent, we cannot fall back to another GDPR legal basis such as legitimate interest. Of course this discrepancy between ePD and GDPR is a huge problem, and the promised ePD update has yet to materialize.
Would randomly generating the session key instead of hashing client IP and other properties satisfy GDPR’s requirement of no PII?
The definition in GDPR Art. 4 reads: [1]
> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
My intuition is that a randomly generated session key could not be tied back to the identity of a natural person, as long as client IP, user agent, etc., are also excluded from the analytics data.
My understanding is that it counts as an “online identifier”. It’s not all that different from a user ID, except the user didn’t ask you to create it (which certainly doesn’t help under GDPR).
As long as you can connect the id to one single client / user, it is PII. It does not matter, where this id comes from, a random hash, an encrypted IP adress. If it's unique, it's PII.
If you only save it on the server, not on the client side, it's not PII. But then it's almost useless for analytics. Because next time the user comes around, you create another hash and therefore another user.
If you do something like Plausible.io with daily changing salts, you know only about daily visitors. This might be GDPR compliant.
If you do something like Fathom with chaining requests, you can see daily uniques, bounce rates and click speed. Not sure this is GDPR compliant though. Would feel better if they run this through an European GDPR watchdog which AFAIK they haven't.
If you do something like SimpleAnalytics with using the referrer to find uniques, you can see daily unique visits but with some statistical errors. Should be GDPR an ePrivacy compliant without your customers needing to declare your usage or have a data processing agreement with you. But gets you the least analytical data (We use SimpleAnalytics).
None of these can do cohorts, the holy grail of VC analytics.
For cohorts I would think you could make something GDPR compliant with Bloom (Cuckoo) filters.
> Umami does not collect any personally identifiable information so it is GDPR and CCPA compliant. No cookie notices are needed because Umami does not use cookies.
From auditing the source code, this doesn't seem to be the case. First, it claims it doesn't use cookies, but it clearly uses localStorage to store a "sessionKey"[0].
The other claim, that Umami is GDPR and CCPA compliant because it does not collect any personally identifiable information is only half true. While the data collected isn't PII (because you can't use it on it's own to identify a user), it's still "personal data". This is because the "sessionKey" stored alongside all events is actually a pseudonymous user identifier. It's really just a hash of the user's IP along with a few other properties[1]. Because the data Umami collects, when combined with some other data, can be attributed back to the user, the data is still considered "personal data". That means you're still subject to most of GDPR such as GDPR deletion requests[2].
[0] https://github.com/mikecao/umami/blob/f4ca353b5c68750bf391e5...
[1] https://github.com/mikecao/umami/blob/master/lib/session.js#...
[2] https://gdpr-info.eu/art-17-gdpr/