Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

IIRC, in some countries census data is sometimes released with small, intentional errors to prevent the ability to locate specific individuals. Make a 36 year old sometimes a 37 year old or a 35 year old. Make a 180 cm person sometimes 182 cm or 178 cm. Small enough errors not to make the aggregate data invalid, but enough to make it hard to identify individuals from the data.

Perhaps this is a partial solution for the Netflix dataset.



This is the approach that Netflix took with the initial data. The paper referred to shows that this is insufficient, and does little to ease privacy concerns. The general problem is that if you 'fuzz' up the data enough to make identification impossible, it's no longer useful as a dataset.


The Census obfuscations have apparently screwed up a variety of research findings:

http://freakonomics.blogs.nytimes.com/2010/02/02/can-you-tru...




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: