Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

'Gloating' is so diametrically opposed to the view expressed in the article that I have no idea how to respond.

As for that part of the FAQ, it is intended as an explanation of some of the theorems proved in the paper and is a response to some of the theoretical objections we face from the data privacy community. It is not an issue that arises in practice.



This is the part that did you in:

Instead, you brushed off our claims, calling them “absolutely without merit,” among other things.

Now even if you are all emotionless, 100% objective researchers with no interest other than the greater good, this very sentence will make it impossible for humans to read your letter without implying a certain level of snark.

If you want to project pure motives for the letter then it would have been best to leave their reaction to your original research out of it, or, probably more appropriately, don't publish an open letter at all—contact them directly.


Ah yes, Appeal to Motive, that'll do it, now I'm not going to accept anything they said because of my interpretation of their motives.

http://en.wikipedia.org/wiki/Appeal_to_motive


>now I'm not going to accept anything they said

I don't think anyone is refusing to accept anything he says. They're just disapproving of his motives, which is fine.


Open letters are about politics, not logic. What an individual critical thinker such as yourself believes has no bearing.


Open letters are about communicating a message, they are also by definition political, and it's really helpful to our social systems when they are based on logic.

Analysis of motives is of course valuable. But it's not an argument against the matter at hand. It may not have been clear that I was being facetious.


I think the parent is referring to an element of we-told-you-so in this letter. I think around the "instead, you brushed off our claims" part of the letter. It sounds a bit confrontational, though perhaps it's too late to rewrite.


It sure makes the writers sound like asses.


Ironically (and I think this is a legitimate use of the term) I hadn't realized when writing my comment that 'randomwalker' and 'Arvind Narayanan' were one and the same. But I apologize for my misinterpretation. I've read it again, though, and still have to fight the same reaction. Perhaps I'm just prejudiced against the 'open letter' format, as I often see it used in that way.

Still, if you can overlook my misreading, I would love your response. I understand that the example from the FAQ is not central to your argument, but I do think it's important in understanding your worldview. If I provide a tool that can be used, erroneously or not, to create or enable prejudice, at what point have I crossed the line into 'violation of privacy'?


I responded to most of the points in a new top-level comment. As for whether there is a violation of privacy even if the conclusions are erroneous, no, I don't think that it is. Some of my colleagues argue that it is, but I don't agree with that point of view. In other words, in order to convince you that there is a privacy problem I have to convince you of the math in the paper. Merely the fact that we came up with an algorithm that may or may not be correct is not sufficient.

That FAQ question is a big red herring. Some of the objections to our paper during peer-review were along the lines of "if Netflix had esentially duplicated every record in the database, how could you be sure you found the right record? What does right record even mean?" No really, it was that silly. So it was meant to be a way to justify the fact that de-anonymization can happen even if you didn't find "the right record."


Instead, you brushed off our claims, calling them “absolutely without merit,” among other things. It has taken negative publicity and an FTC investigation to stop things from getting worse. Some may make the argument that even if the privacy of some of your customers is violated, the benefit to mankind outweighs it, but the “greater good” argument is a very dangerous one. And so here we are.

In other words, "we showed them!" -- seem an awful lot like gloating to me.

It also doesn't appear that you ever seriously considered the "greater good" argument, beyond asserting that it's "dangerous." This leaves me with the suspicion that you chose to take the easy way out instead of weighting the actual costs and benefits of what you were doing.


This leaves me with the suspicion that you chose to take the easy way out instead of weighting the actual costs and benefits of what you were doing.

You're shooting the messenger here. If a dataset can be de-anonymized, it's better to know that and be able to make informed decisions.


Well, I can say this: congrats on getting one of the best industrial data sets locked up. I hope you're proud of the work you did.

As for the possibility of netflix running a contest like this in an online fashion, well, maybe, but the benefits of having access to the data are enormous, plus you've now moved to a model where only the privileged few are allowed access via NDA, or Netflix has to provide computing resources to all researchers, etc. I don't see it happening.


If Netflix had attached credit card info and social security numbers to the info would you be singing the same tune? You're basically saying that you don't like the outcome due to your perceived utility of the data. Thanks I don't see you talking about:

  - Do you view this as a breach of privacy?
  - What do you consider private?
  - Do you view this as a breach of privacy, but just
    don't care?
  - Do you feel that the utility of the data out-weighs
    the privacy concerns?
  - What about the people that view this as an invasion
    of privacy and have their Netflix user data in that
    set? Should they be thrown under the bus in the pursuit
    of progress because *you* feel that the data has more
    utility than the privacy concerns do?
I see a lot of people arguing that this is 'stifling innovation,' but innovation is not an end unto itself. Banning using human test subjects against their will in the pursuit of scientific knowledge 'stifles innovation' too, but I think you would be hard-pressed to find many people to see that as a bad thing. "Stifling innovation" in the pursuit of privacy concerns should be a noble cause. It benefits the public. This is hardly the argument against intellectual property rights and I really find it annoying that people seem to be lumping it into the same ballpark with these boilerplate "stifling innovation" comments.


I see it as a breach of privacy people might have prevented had they known about the dangers of their reviews getting linked to their accounts. Many companies have their large credit card databases stolen or hacked into through sheer incompetence. Netflix is not in the same boat with these.


So... having private info stolen == bad company, releasing private info == good company.


And if pigs were ducks would they quack? In other words, your question is moronic since netflix didn't attach cc or ssn info.

Also, you're a retard for comparing movie predictions, and the possibility of matching a person to their movie viewing history, as even remotely comparable to human test subjects. Please.

As for invasion of privacy, I'm unsure -- I'm not sure of the probability of matching, the quantity of information necessary to get a good, for various values of good, match, etc. What is clear is the authors had a major hand in stifling a nontrivial nonacademic dataset and damaging the community around it. They further have aided the lawyers suing netflix, and have helped poison the well for any company in the future that decides they might want to do something like this. So I say congratulations! For the author to pretend this didn't happen as a result of his actions is disingenuous.

As for your questions, well, they're just stupid. We live in a world where the fbi/police get access to your PHYSICAL LOCATION 24x7 without a warrant just by asking, where your emails and telephone calls are scanned by the nsa with plans to open this data set to the police at large, where google/yahoo/et al see turning your emails and access patterns over to the police as a revenue opportunity, etc. If you care about privacy, this is such small potatoes as to be a waste of time. BTW, anyone can still spend roughly $100 to access your phone call history. G has, in subpoenable form, your entire search history -- and don't think that clearing cookies prevents stapling that history together.


> We live in a world where the fbi/police get access to your PHYSICAL LOCATION 24x7 without a warrant just by asking, where your emails and telephone calls are scanned by the nsa with plans to open this data set to the police at large, where google/yahoo/et al see turning your emails and access patterns over to the police as a revenue opportunity, etc.

So you're saying that since government agencies have access to a lot of my private information, I shouldn't care about any of my private information remaining private? Sounds like you're creating a false dichotomy. You're presenting things as if you can only care about all of your private data or none of your private data; since the government has access to large portions of your private data and you don't have much (or any) control over that, you should therefore care about none of your private data. Isn't it possible for me to care about all of my private data, but to choose the battles that I fight?


And your hill to die on is that someone might guess a movie someone in your household rented. Okay then.


Death of a thousand cuts


Or, are they opening the door for you to profit from selling access to a utility map reduce cluster focused on their data set?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: