>> In that sense any opt-in choice given to another is yet another privacy breach on their 'contacts' for example.
That is a non-sequitor, when we are discussing opting-in to social science research.
> That is a non-sequitor, when we are discussing opting-in to social science research.
As I understand it, the commenter's point does not rest on 'contact' linking being present. Their point is that any kind of data linking provides a reindentification risk.
Regarding the risk of data linkages, how confident are you that Mozilla and others with access to the data will manage it ...
1. ... up to the currently-accepted level of knowledge (including hopefully some theoretical guarantees, if possible, and if not, mitigations with known kinds of risk) and ...
2. ... that the current level is acceptable given that history of data privacy doesn't paint a rosy picture?
To be open, I'm not interested in your confidence level per se, but rather the reasoning in your risk assessment. I want to weight the various factors myself, in other words. For example, you appear to have more confidence in IRB's than I do.
Knowing the history of the "arms race" between deidentification and reidentification, I don't put a whole lot of trust in Institutional Review Boards. Many smart, well-meaning efforts have fallen prey to linkage attacks. They are insidious.
P.S. In my view, using "non-sequitor" here is a bit strong, perhaps even off-putting. It is only a "non-sequitor" because you are making different logical assumptions than the commenter. Another approach would be to say "your conclusion only holds if..." This would make your point without being so pointed. It also helps show that you want to understand the other person's assumptions.
> As I understand it, the commenter's point does not rest on 'contact' linking being present. Their point is that any kind of data linking provides a reindentification risk.
It appears that the parent commenter revised its content to indicate that the concern was indeed “your data getting mixed with my data, when browsing Facebook”, to paraphrase.
My response there was essentially: ethical review would have to determine if all data must be provided through informed consent of all the originating humans.
Held to the gold standard of ethics, an IRB would likely have to contraindicate a research design if it did not provide a way for every individual human involved to provide informed consent. If any single individual in a data set indicated that they did not consent, then that data set would need to be reshaped to not include that individual. In lieu of that, the entire data set would have to be excluded from study.
Of course, that has some complex implications, when it comes to broad categories of data sources for browser usage: social networking sites would be a minefield. Did the website author provide consent for their content to be machine analyzed for sentiment, etc., if one really wanted to get down to it. You’d have to consider each and every resource location. Can’t assume that all browser traffic is open web traffic - someone could have left their Rally extension running while navigating to a corporate confidential network, complex copyrights, etc.
My understanding is that the US Supreme Court is about to decide on whether “if you can read it, you can keep it” as a consequence of Microsoft/LinkedIn vs. hiQ Labs, so don’t forget the “arms race” of justice, either.
> Many smart, well-meaning efforts have fallen prey to linkage attacks. They are insidious.
Indeed, even just basic double-blind medical studies are hard to defend when you consider operational security, let alone information security.
In case it is of interest, here is a fairly short article with a short historical look at data de-identification. If nothing else, it is one jumping off point.
> That is a non-sequitor, when we are discussing opting-in to social science research.
As I understand it, the commenter's point does not rest on 'contact' linking being present. Their point is that any kind of data linking provides a reindentification risk.
Regarding the risk of data linkages, how confident are you that Mozilla and others with access to the data will manage it ...
1. ... up to the currently-accepted level of knowledge (including hopefully some theoretical guarantees, if possible, and if not, mitigations with known kinds of risk) and ...
2. ... that the current level is acceptable given that history of data privacy doesn't paint a rosy picture?
To be open, I'm not interested in your confidence level per se, but rather the reasoning in your risk assessment. I want to weight the various factors myself, in other words. For example, you appear to have more confidence in IRB's than I do.
Knowing the history of the "arms race" between deidentification and reidentification, I don't put a whole lot of trust in Institutional Review Boards. Many smart, well-meaning efforts have fallen prey to linkage attacks. They are insidious.
P.S. In my view, using "non-sequitor" here is a bit strong, perhaps even off-putting. It is only a "non-sequitor" because you are making different logical assumptions than the commenter. Another approach would be to say "your conclusion only holds if..." This would make your point without being so pointed. It also helps show that you want to understand the other person's assumptions.