Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These type of SAR requests (even milder ones) are of course impossible to handle manually. Self-assessment, the way most companies decided to handle GDPR, isn't much help here. How do you automate personal data discovery, especially for already existing data?

Funnily, the biggest fear companies have regarding GDPR and SAR does not originate from "Mr. I. Rate the customer", like in this article. It comes from disgruntled employees ratting on the company. Employees know best where personal data is stored (and often no one else in the company does), so they can really do some surgical damage. GDPR introduces a whole new dynamic.

This may be a good place to shamelessly plug a tech we developed (Show HN!) for automatically locating personal data across corporate resources: https://pii-tools.com

Personal data discovery is but a small piece in the compliance puzzle, but a piece that is critical to understanding what sensitive data is even out there: CVs with photos in backups? Scanned passports in attachments of email archives? Names and addresses in database tables? How about S3, Azure, GDrive?

Let me also add that there's no shame in not having a comprehensive view of all the corporate personal inventory. Larger companies grow their resources organically, through acquiring other companies and separate business units doing their own thing. It is a complex problem, but one where technology can help.



> How do you automate personal data discovery, especially for already existing data?

You attach an owner id to every record, and make sure all your systems can dump all information they store according to owner id. To the extent existing systems don't, you fix them.


Charming response :-) Entire industry dismissed in a single HN comment. Poof!

I'm not sure we understand "data discovery" to mean the same thing, but you reminded me of "How To Draw An Owl":

http://sethgodin.typepad.com/seths_blog/2014/01/how-to-draw-...


Hrm, did you expect me to design the output of an entire industry in an HN comment? I didn't say it was easy to do. But it is what must be done. My goal was not to provide code, but an outline, a very rough sketch, rough to the extent that it could fit in a pair of sentences. I guess in that sense the owl metaphor is accurate!

We've had two years to work on this. At my company, we've had entire teams spending significant fractions of their time over the last year prepping. As a result, we'll be ready when the switch flips.


It's refreshing to see such responsible approach.

What you suggest is (as far as I understand you) orthogonal to automated data discovery / inventory mapping, though.


I agree we are not using the same definition of data discovery. In my use case, you know a priori which user provided the data, you just need to plumb the information through to all downstream systems. This seems sufficient for GDPR as I understand it. I had not read your entire comment and did not realize you were promoting a system to try to do something like this automatically. I did not realize the initial question was rhetorical.

FWIW I would be worried about relying on such a system! But based on the description it seems helpful. What does it do about derivative data that doesn't directly contain any PII?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: