Hacker News new | past | comments | ask | show | jobs | submit login
What the data miners are digging up about you (newscientist.com)
26 points by makimaki on Nov 28, 2008 | hide | past | favorite | 8 comments



"Databases know more about you than you realise. A Carnegie Mellon University study recently showed that simply by knowing gender, birth date and postal zip code, 87% of people in the United States could be pinpointed by name."

Umm, that was back in 2001. The research has come a long way since then. This is what I do for a day job (studying privacy leaks in databases), and IMO it's worse than you might think. Example: http://33bits.org/about/netflix-paper-home-page/

And here's something from two weeks ago: http://33bits.org/2008/11/12/57/


I think that blog's "About" page is one of the best pieces of academic writing I've ever seen, on a per-word basis:

>> This is a blog about my research on privacy and anonymity. The title refers to the fact that there are only 6.6 billion people in the world, so you only need 33 bits (more precisely, 32.6 bits) of information about a person to determine who they are.

This fact has two related consequences. First, a lot of traditional thinking about anonymous data relied on the fact that you can hide in a crowd that’s too big to search through. That notion completely breaks down given today’s computing power: as long as the bad guy has enough information about his target, he can simply examine every possible entry in the database and select the best match.

The second consequence is that 33 bits is not really a lot. If your hometown has 100,000 people, then knowing your hometown gives me 16 bits of entropy about you, and only 17 bits remain. But the real danger is that information about a person’s behavior, which was traditionally not considered personally identifying, can be used to cause serious privacy breaches in a variety of different contexts. >>


well, that's me, so thank you!


The script http://pastebin.com/f239f43a7 referenced in the above link could be rewritten without pype module as follows http://pastebin.com/m5c5a89df (not tested).


I'm sure it could. Thanks for that.

pype is something I use every day as part of my workflow, and I didn't think it was worth the effort making a non-pype version. The other reason I left it in is to motivate myself to properly package and release pype one of these days, along with a video demo and so on. A couple of people I showed it to have said, "holy crap, dude. you can do that? this blows my mind." So I'm thinking there's value in making it usable for everyone instead of just in my own projects.


Such syntax'd popped up before. http://code.activestate.com/recipes/276960/ I was interested how it affects readability and code size. The script's version with pype is both clear and concise (compared to the sloppy version without it).

The concept of Unix' pipes (as in books "Unix Programming Environment" and "The Art of Unix Programming") or SICP's stream processing (http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-24.html... ) is applicable to a large number of problems.

It would be interesting to see how pype transforms the code from "Generator Tricks for Systems Programmers" (http://www.dabeaz.com/generators-uk/ )

Python lacks a convient syntax for chaining of function calls suitable for a functional style programming (too many parentheses and ugly lambdas). Ruby and Perl do a better job here; Haskell with its overloading of whitespace is a clear winner. pype could help with parentheses but lambdas stay.

'|' is unambiguous for non-arithmetic types and easily understood (via shell analogy). Swapping pype's backend by a multiprocessing (package) based implementation could leverage multicore for large pipes.

Having said that IMO Domain Specific Languages are a no go in Python. It is the Ruby/Lisp Way to invent a new syntax/language for every problem domain possible; unless pype could be included in the Python's standard library. In that case it would be the Python syntax. pype implementation is straightforward, the syntax is familiar so there must be some simple reason why this approach haven't received a wide adoption.


"Microsoft has filed patents for technology that monitors the heart rate, blood pressure, galvanic skin response, facial expressions of office workers, and even their brain waves.

The idea, the patents say, is to let managers know if workers are experiencing heightened frustration or stress."

Consultant to Boss: As you can see on this graph compiled from brainwave data, your employees are experiencing heightened frustration from the fact we're monitoring their brainwaves.


I wonder if Scott Adams peruses this site for strip ideas.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: