I found one of the toughest things about GDPR is determining what is or isn't personal data, when dealing with less common types of data.
R6DB for example, a video game stats website, decided to shut down [1] due to unknown territory over whether or not player account IDs and statistics they were scraping from official APIs, are "personal data". (They have now reversed their decision to shut down I believe)
As soon as you leave the classic Website tracking/Product usage categories, the definitions get murkier. If I shoot a GDPR data request to eg. Blizzard, do they have to give me any game/stats history they have on me?
> As soon as you leave the classic Website tracking/Product usage categories, the definitions get murkier. If I shoot a GDPR data request to eg. Blizzard, do they have to give me any game/stats history they have on me?
If they are able to corelate that data back to your account, and you account contains information that allows them to link it back to individual you, then absolutely.
If they keep that information completely anonimized and have properly siloed that data, then they would not.
A good rule of thumb is that if they would be technically able to comply with the request, then they have to comply.
Disclaimer: IANAL but CTO of analytics startup in EU.
What if you don't have the original account's information?
Let's say there's an api on https://example.com/api/v1/users which is world-readable and contains two values: user id and user rank. You scrape all this and put it in your database. I come to you and say "I'm user id 1234 and I want you to delete me from your db". Is this a legitimate request?
This is the situation R6DB is in. Furthermore, in such situations there's a case to be made that this information cannot be deleted if the whole point of the site is to provide global rankings.
It doesn't matter if you can correlate user id and account holder, what matters is that somebody can. That's the same reason why IPs are considered personal information.
>Furthermore, in such situations there's a case to be made that this information cannot be deleted if the whole point of the site is to provide global rankings.
You don't have to delete it, anonymizing is enough. For global rankings you don't need to know that user 1456 has rank 56, you just need to know that somebody holds that rank. So whenever a user requests to be deleted, replace his id with -1 (or choose a random number outside the normal range) and you have satisfied the request (only the PII needs to be deleted, not everything about the user).
A lot of user ids are literally randomly generated data (for example any UUID used as user id). That doesn't make it less useful.
But that doesn't mean that everything is personally identifying information. "I am 53 and like Harry Potter" isn't enough to identify me, so it isn't PII and isn't covered by GDPR. My Blizzard username is enough for Blizzard to identify me, so it is PII.
The important concept here is that in Europe, data protection laws are not only protection against corperations, they are also protection against nation states. We had a few bad experiences with the effects of government surveilance, and we had a few bad dictators in the last century.
> For global rankings you don't need to know that user 1456 has rank 56, you just need to know that somebody holds that rank.
Isn't the purpose of global rankings to know how any given user is ranked globally? It's not much of a ranking system if all you can say is, "Yes, we have one million users, and they are ranked 1 to 1,000,000. We are not able to identify who is what rank."
What's wrong with "we have one million users. You are ranked 45698th. The one below you is proplaya92, the one above you chose to have his name redacted" (but we can tell you his playtime on all characters, his k/d ratio, etc).
Funny thing is that even if you do such request, company refuse and then you make a complaint which for example is thrown out, you still won't know what is the best practice, because there is no such thing as precedent there. Each case is judged individually. Actual compliance depends on particular civil servant's interpretation who is handling a complaint about your company.
The question is, who is the owner of this data? Is it Blizzard or is it R6DB? I would say, the burden is upon Blizzard, so they would be the designated Controller of the data. R6DB would then need to sign a Processor agreement with Blizzard, of which one requirement is to delete certain data upon Blizzard’s request.
R6DB could also position themselves as the controller of this data, but that would put quite a bit of the burden on them.
But this is my interpretation of the situation. Would probably need a more in-depth discussion with both parties to figure out what the intentions are.
Is that really how you think GDPR will be interpreted? As a European, are you OK with that?
If a football team plays, and a stats website (call it ERDB) uses Eredivisie's API to get player stats and then publishes them, should ERDB have to be processor? Then there would have to be a processor agreement every time Eredivisie issues an API key. And they can't have a public API. And if ERDB wants to switch from AWS to GCP that will have to get written approval from Eredivisie for a new sub-processor?
It's an interesting question. There are some exceptions for lawful processing under GDPR[1]:
> Processing shall be lawful only if and to the extent that at least one of the following applies:
> processing is necessary for the performance of a contract to which the data subject is party or in order to take steps at the request of the data subject prior to entering into a contract;
> processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller;
So in this case, it's probably the case that any athlete will have to sign a contract saying they are OK sharing their stats publicly in order to play. If an athlete sued Eredivisie to try to be forgotten from the league tables, how would that case be decided?
It's also worth noting that in the original "right to be forgotten" case that Google lost, the original newspaper articles reporting the criminal cases were not required to be expunged, so there is still a "public interest" defense for keeping information about people in public. It's certainly unclear exactly where the boundary is on that though.
I wonder also if there is an argument that a professional player's stats are not Personal Data; those stats are a record of events that occurred in public, while engaging in a performance. IANAL so interested in opinions of more knowledgable folks on this one.
Under GDPR, the Controller is the entity that the Data Subject gives their Personal Data to, and the preponderance of responsibility for managing that data lies with the Controller.
If the API data under discussion here includes "personal data", then Blizzard shouldn't (under GDPR) be sharing it publicly; Blizzard is the Controller, and it's their responsibility to ensure that only known and compliant Processors get access to the PI. (E.g. there is a requirement that all Processing activity is done under a binding contract (https://gdpr-info.eu/art-28-gdpr/ SS 3))
When an individual wants to be forgotten, they can contact Blizzard (as the Controller) and tell Blizzard to delete all their PI. It's Blizzard's responsibility to ensure that all Processors delete the PD that Blizzard gave to them.
So in this case, my conclusion would be that it's Blizzard's responsibility to either anonymize the data so that it contains no PD (this might be the case already), or stop sharing the PD publicly and require a DPA (Data Processing Addendum) signed by all API consumers. Obviously the second case is not very palatable.
As soon as you could reasonably pinpoint it to a certain person, it becomes PII. A classic example includes a combination of zipcode, gender and age. While individually they would not be a problem, it’s the combination of these different parameters that allow you to narrow the pool so much it can reasonably assumed to be PII.
> . While individually they would not be a problem, it’s the combination of these different parameters that allow you to narrow the pool so much it can reasonably assumed to be PII.
There are zipcodes in USA with only 1 person. So storing just zipcode would be ID information. Its an extereme example but how can machine check whats 'reasonable' in this case, go off of some census data for zipcode?
I wonder if there are any real world ( as opposed to things like ingame points) information that can be considered not PII .
There is also the interpretation of when is it PD or AD.
In Norway the ‘The Norwegian Data Protection Authority‘ states 4 or less is personal, no way to link to less than 5 is anonymous.
>If they are able to corelate that data back to your account, and you account contains information that allows them to link it back to individual you, then absolutely.
It's amazing that people actually think this type of request is reasonable and that a regulation enabling this type of request will not kill Europe's competitiveness wrt internet companies.
I'm surprised no one has brought this up (yet). GDPR is super expensive to remain compliant, simply because of the broadness of the terms used, leading to undefined scope of liability.
As one other HNer previously mentioned, the cheapest way to stay compliant with GDPR is to completely block access to EU customers. In fact, this is what I did with my business. I redirect to a generic text file (not even a HTML that could trigger a GDPR clause by itself) explaining my stance.
Don't worry about false equivalence that many will raise "So, you don't care about our data HUH?". The intentions behind GDPR may be good. But, the roadmap seems completely stupid. Many think blocking EU customers is an arrogant move. No, it's not. Not everyone has the finance and time to comply with GDPR. A typical re-implementation of our web application will cost us weeks if not months, for example. That could be time spent building features customers want, not fighting some vague elitist law (comply with us or else you're doomed!). If enough business owners block access to EU customers, then, EU will lose a lot of business and that will trigger them to hopefully do something about the vagueness of GDPR. I don't even live in EU, for instance, yet this re-implementation will cost me tens of thousands of dollars (translating my time) that the EU isn't going to pay me for their vagueness.
I've had enough of GDPR. I know many EU HNers will not agree, but please consider putting yourself in a solo founder's shoes.
If you are a solo founder without any business presence in the EU, or clients in the EU, you have nothing to worry about. Blocking EU site visitors is unnecessary, and GDPR covers this in clear language.
What about reimplementing will cost you tens of thousands? That implies you either have a heavy use of personal data without a legitimate interest or have significantly misunderstood the requirements laid out in GDPR.
Let's say I operate a website where you can sign up with an email address, and I'll send you recipes. The recipes call out specific brands of ingredients. The vendors who sell those brands pay me for this service. I and my server are in the USA. Additionally:
1. My food vendors operate (a) all only in the EU; or (b) mostly worldwide, but one operates only in the EU; or (c) all worldwide; or (d) all only in the USA, but an independent third party imports many of their products into the EU.
2. My recipes are for (a) only French food; or (b) all kinds of food.
3. My email includes descriptions of the restaurants that originated the recipes. These restaurants are (a) all in France; or (b) about half in France, half in New Orleans; or (c) all in New Orleans. (New Orleans is an American city with strong French influence on its culture, and a francophone minority.)
4. The recipes are distributed in (a) French only; or (b) English and French; or (c) English only.
5. I advertise my site (a) with a run-of-network ad that shows mostly in the USA, but also on a French newspaper; (b) on a small blog whose American operator doesn't track its audience, but that I've heard is popular in France; (c) not at all, relying on word of mouth.
So I've set out 4×2×3×3×3 = 216 cases. In which of them am I subject to the GDPR? What factors or combinations of factors are determining? If I asked this question of two lawyers, how closely would you expect their answers to agree? What confidence would they express that their answers would agree with the regulator?
I think the people who think GDPR compliance is easy are saying to themselves, "If I behave in accordance with these general principles as I understand them, then the regulator will see me for the good person that I am and I'll be okay". That may be true, but it's not law.
You have EU customer emails. You're subject to gdpr in all of them. None of the factors you suggested matter. Store your emails properly, have unsubscribe links, answer data requests.
You just gave one of the most straightforward cases...
> If you are a solo founder without any business presence in the EU, or clients in the EU, you have nothing to worry about. Blocking EU site visitors is unnecessary, and GDPR covers this in clear language.
Do you think this statement is correct or incorrect? If you think it's incorrect, then why are you interpreting the "clear language" of the GDPR differently from its poster? (I personally think the statement is either incorrect or too vague to assess.)
If you think the statement is correct--I presume, because you think that as soon as an EU visitor signs up, you have a "client" or "business presence" in the EU--then what meaning does that poster's statement convey? Does it mean anything more than "blocking EU visitors is unnecessary, as long as you have zero EU visitors", a true and entirely meaningless statement?
Or are you saying that the email address makes this different from just a visitor? Even if my server is a typical default configuration that logs time and IP for each visit? Even if my ad network logs tracking cookies? Even if a data broker exists somewhere who could map that information to a real name? Even if I buy that data?
And, for clarity: What about the case where the website (1) calls out only products with authorized distributors only in the USA, (2) has recipes for all kinds of food, (3) describes restaurants only in the USA, (4) is in English only, and (5) isn't advertised makes it subject to the GDPR? The EU says in an example that:
> Your company is service provider based outside the EU. It provides services to customers outside the EU. Its clients can use its services when they travel to other countries, including within the EU. Provided your company doesn't specifically target its services at individuals in the EU, it is not subject to the rules of the GDPR.
Do you agree with this guidance? If yes, what factor constitutes the specific targeting? Or does the "travel" language mean we exclude only non-EU residents who temporarily visit the EU? Also EU residents who sign up while in the USA, but then return home? But not EU residents who sign up while in the EU?
How sure are you that your answer is correct? I get that you're sure that it's morally right, and complying is always the conservative choice; but what probability would you assign that an EU court would reach your same conclusion, that all 216 cases are subject to the GPDR?
And since any suggestion that the GDPR is less than perfect attracts angry reactions: I am not asking whether it's a good idea to comply with the general spirit of the GDPR for all visitors worldwide, and I think the answer to that question is yes. I'm asking what the law says. I think the rule of law is important, and I'd venture that most people who have lived in countries with and without it would agree.
It is correct and it is clear. If you have customers in the EU, gdpr matters. If you don't, it doesn't. And either way, you should strive to follow it, because GDPR's rules really are a set of common sense things you should have been doing all along:
- Store your stuff properly (encrypt sensitive data)
I said explicitly that I'm not trying to debate what's morally right, or good for business (since I agree that good privacy practices are both of those). I'm asking what is the law.
> If you have customers in the EU, gdpr matters. If you don't, it doesn't.
Either this is false, or you've adopted an unusual definition of the word "customer". The email recipients don't give me money, but even I agree that e.g. if I promote my site with advertisements only on EU sites, and know or should know that most of my recipients are EU residents in the EU, then it's relatively clear that I'm subject to the GDPR.
Lawyers care about precision of language, because people's lives are at stake. It's not useful to discuss legal matters without that precision.
I guess you're certain enough that all my previous examples are subject to the GDPR that you don't think it's worth discussing why, since you didn't answer any of my questions. How about:
I. I run the website that (1) calls out only products with authorized distributors only in the USA, (2) has recipes for all kinds of food, (3) describes restaurants only in the USA, (4) is in English only, and (5) is advertised on American newspaper websites; but I take no specific measures to exclude EU visitors? I think you think I'm subject to the GDPR.
II. Same as II., but I block EU IP addresses?
III. Same as II., but I ask the remaining people if they're subject to the GDPR, and block them if they say yes?
IV. Same as III., but I require a credit card with an American billing address?
V. Same as IV., but I require evidence of legal American residence (e.g., a scan of a US passport of visa)?
Are you able to answer these questions? Or does it just not bother you that you can't, because compliance with the GDPR serves a purpose you agree with?
What do you understand by the phrase "rule of law"? Do you think it's important? Hungary is an EU country. Its prime minister has described George Soros, the founder of the Open Society Foundation, as an "enemy of the state". If you ran an organization publicly associated with the OSF in Hungary, then how would a notice of a GDPR investigation from your government make you feel? Wouldn't you feel better if the rules gave the regulator less room to maneuver?
Here's what I would do: Nothing. You're in the USA. Your site is in the USA. Don't worry about it. If you receive any undesired correspondence about the GDPR, treat it like you would any other junk mail.
Do you note this somewhere on your sites so that non-EU customers are also aware you're doing this?
While I know you believe that blocking EU customers over GDPR doesn't mean that you don't care about personal data, many of us do feel that way, and I know that I at least would like a way to avoid interacting with such companies in a more convenient way than running everything through an EU-based VPN.
I think Europeans really overestimate how much revenue is derived from European customers. European CPMs are a fraction of American CPMs. Europeans purchase a fraction of the products that Americans do.
Even if I think I'm completely compliant, I think I'd make the decision to limit it to North America until I had enough revenue to hire a separate data protection officer.
will be harder to collect in future. Although the GDPR does carve out exceptions for research, if I were a company, I'd be extremely hesitant to work with anyone coming by and saying "Hey, can I have that dataset which you locked down due to GDPR concerns for my 'research'?".
It's sad really, one of the benefits of the Internet was supposed to be it's openness and transparency. Ah well, it was a nice dream for a while.
* I've been trying to confirm that that is the new clause that has been causing problems, but unfortunately, I can't find a historical revision of the Steam API Terms of Use. :( I can only point to articles that say that services are shutting down due to API changes. If anyone can expand on this (or contradict it), I'd be happy :).
What's the general consensus on the last question you raised? Would Blizzard be expected to provide that info? What about "hidden" values like your MMR?
I don't know actually. And lawyers I asked give inconsistent answers. I was planning to make one to Blizzard come 25th just to figure out what their process is.
‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
The keyword people like to use is "in context", where data is personally-identifiable if it can be used to identify you in a certain context. This... covers almost everything with a unique=TRUE in a database, and any ranking-type systems such as the excellent example of MMR you gave earlier.
Thanks for posting. This does not look all that different from PCI compliance, though the scope of data involved is larger and as others have observed there's some ambiguity about coverage that will need to be worked out in practice.
Editorially speaking I'm glad to see this emerge even though it means more work for me personally. If anything some of the fines seem too low. (I'm looking at you, Equifax.)
PCI compliance is a lot more annoying than it appears at first.
A business that accepts credit cards can easily end up storing credit card numbers in places they do not intend to. That can happen even if they do all their intentional credit card handling through an outside service that their check out pages post to or AJAX to so in theory no credit card number ever even reaches the business' servers.
For example, customers will email you to tell them that the credit card they pay their subscription with is about to expire, and give their new credit card number and security code in the email (and for good measure often give the old credit card number and security code).
Same for your help desk system, regardless of whether it is based on email or web forms. Customers are going to stick credit card numbers in tickets.
I don't think I've personally seen a customer stick a credit card number in a forum post or in a blog comment, but I wouldn't it past them.
Basically, if customers can put text in it [1], you really have to assume some customer is going to put a credit card into it. I'm not kidding. I've seen credit card numbers show up in name and address fields.
Oh, and when they stick unrequested credit card numbers in emails, support chats, etc., they will often format them in weird ways. If you want to find these you have to make your scripts that search for them quiet lenient in what they accept, but then you get a lot of false positives.
Funny thing is that everything in the Equifax breach (addresses, social security numbers, income) is public information in Sweden and everybody can access it.
Speaking of which, how would the Equifax breach have been affected by GDPR ? My understanding is that they had legitimate reason to request and store that data, and GDPR is not some magic tool that prevents breaches.
They handled the disclosure awfully. The GDPR has time-frames for notifying data protection authorities and the people affected about breaches. Presumably, they would have failed on that count.
Then the question is did they have the appropriate safe-guards in place, w.r.t. the importance of the data. I don't think we quite know what their systems look like. But the answer is probably not, given the scope, amount of very personal data, and the fact data from US, Canadian, and UK individuals was leaked, and the time it took to notice the breach.
Factoring both these things in and maybe the fact that an executive was charged with insider trading, they could be fined accordingly.
Beyond just the breach, I think Equifax's business would be difficult under GDPR?
One of the major problems with the breach is that, if you are worried about Equifax's security, you can't just choose to not let them store your data: they will store it, and provide it to third parties, regardless of your consent.
Equifax is a global company headquartered in America that operates in 24 countries, including 3 where people are covered by GDPR: UK, Spain and Portugal[1].
They should be. Aside from operation in Europe eventually US legislative bodies will wake up and start passing laws to protect data. GPDR will be an obvious practical example to study, much as Obamacare was at least partly modeled experience in Massachusetts. I have no doubt that many tech companies have already put preventing this at the top of their DC lobbying agenda.
TL;DR: It's a spreadsheet to help you list out what personal data you keep in your organization, who has access to it, how it's secured, how long you plan on keeping it and what you intend to do with it.
MORE: I'm not trying to be dismissive, it's great! And whether or not you're going full-on trying to get GDPR compliant or not, it's _really_ hard to think of a scenario where taking the time to think and document your data handling and security isn't a net positive for the security posture of your application.
When you dig into the GDPR [1] you find that a lot of it is like this: common sense stuff that you'd hope everyone was doing already, but apparently aren't.
- Tell People what you're going to do with their data
- Don't do other stuff with it than what you told them
- Keep it secure
- Don't give it to other companies that might do things with it you didn't tell people about
- If you lose control of it in a data breach: tell them
It’s the implementation that is difficult. For example, if someone wants their data deleted I need to go delete all of the email correspondence, I need to delete them from applications like Google Analytics. If I follow them on Twitter or interacted with the on Twitter I need to delete those posts. Oh and then there are backup archives that I now need to scrub. The point being is a lot of data is not well assembled and cleaned for deletion which is why data analytics on your existing data is so hard in the first place!
And the difficulty can rise exponentially if you have a giant stack that wasn't designed with GDPR in mind (though it may be totally benign in its risk towards data subjects) and which cannot be easily disentangled to comply with some of these requirements.
"The GDPR however does make a small concession to companies in this case: the steps they need to take in this direction are limited to the available technology and the cost of its implementation. Organizations must take reasonable measures to ensure processors are aware of the request, but will not be at fault if the data is not completely erased by third parties."
The vaguery there is the problem. I know how I would decide if it were up to me, but if a supervisory authority comes calling, that's just my argument, and they almost certainly have a preset opinion about what is reasonable.
> We are sharing this with anyone who wants to use it! Why?
[Their answer amounts to "because".]
Because they've gotten all the internal value out of it, and now with the deadline here, they can get maximum value out of it by "open-sourcing". Sorry to be so cynical, I'm so sick of this style of embedded advertisement.
That said, I do rather like the simple format of their template. Although their handling of "security" is too thin to be useful in any way, I will otherwise integrate their templates into our current documentary processes, which are awful by comparison.
You're highly unlikely to ever be fined or come under official scrutiny. That being said, you may still want to try and comply with the GDPR:
1. It's becoming a checkbox buying issue for EU customers. Depending on what data you have, what you're doing with it, the fact that you aren't GDPR compliant may mean that your EU customers aren't GDPR compliant b/c they use you.
2. At some level you should care about the security and privacy of your users and while far from perfect the GDPR is a good general framework.
3. Unlikely is not the same as none. Given the tremendous technical leverage that exists now, it's not crazy to think that even a small SAAS with a couple hundred customers might have tens or hundreds of thousands of personal records. (consider an email newsletter service with 200 customers each of whom have 5,000 subscribers -> 1 million personal records).
If you have a million records on hand and a data breach happens, it's quite likely someone is going to complain and you might suddenly be on someone's radar. Don't take that to mean "omg, if I have a databreach I could get fined for $$$$". Because it's much more like: if you have a databreach and you haven't done a good job securing the data or letting people delete it or have some clear data handling rules and you failed to report the breach you might be in trouble.
Which requires the organization to supply information about use of personal information. This will have an operational overhead regardless of scope of Personal Information used.
Oh please somebody correct me if I'm wrong. Otherwise that's a silly amount of operational overhead for bootstrapping. Even for systems designed from the start to not use personal data: The org would still need to handle a rather detailed and costly administrative request.
You should obviously comply, otherwise you might face fines and EU business customers basically can't use your SAAS. But as long as you're small that compiance isn't that hard. It's mostly basic stuff like having backups, encrypting them, not mailing your customer's info around in excel sheets, not giving everyone access to your servers, updating your servers, having a privacy policy that lists who gets customer info, etc.
The scary stuff is what happens if you have thousands of customers and they start requesting deletion or information. But if you only have a hand full of EU customers you can wait till that happens, you only have to comply within a month.
EU regulators are generally very reasonable. If you show that you made reasonable effort but had a genuine mistake they will generally give you a warning or at worst a low fine.
Before "Who owns and controls the personal data we collect?", there has to be "What personal data do we keep, in actual reality?"
That is, personal data discovery.
Emails, archives, cloud storages… it can be hard to be sure when relying on just human introspection or wishful thinking. Especially for larger organizations.
Automated discovery tools help -- we built pii-tools.com, an AI–assisted tool to locate personal information across corporate assets. If you're having trouble filling in spreadsheets like this with actual data, get in touch (contact in profile).
R6DB for example, a video game stats website, decided to shut down [1] due to unknown territory over whether or not player account IDs and statistics they were scraping from official APIs, are "personal data". (They have now reversed their decision to shut down I believe)
As soon as you leave the classic Website tracking/Product usage categories, the definitions get murkier. If I shoot a GDPR data request to eg. Blizzard, do they have to give me any game/stats history they have on me?
[1] https://medium.com/@r6db/r6db-is-shutting-down-db1b59b031ac