Have you thought about using central/global differential privacy (which tends to have much less noise) on the "high level aggregates" or "aggregated datasets" that persist after the research study ends?
E.g. from the FAQ: "We do intend to release aggregated data sets in the public good to foster an open web. When we do this, we will remove your personal information and try to disclose it in a way that minimizes the risk of you being re-identified."
It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.
> Have you thought about using central/global differential privacy (which tends to have much less noise) on the "high level aggregates" or "aggregated datasets" that persist after the research study ends?
Yes. Central differential privacy is a very promising direction for datasets that result from studies on Rally.
> It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.
I've done a little re-identification research, and my faculty neighbor at Princeton CITP wrote the seminal Netflix paper, so we take this quite seriously.
E.g. from the FAQ: "We do intend to release aggregated data sets in the public good to foster an open web. When we do this, we will remove your personal information and try to disclose it in a way that minimizes the risk of you being re-identified."
It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.