Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can such methods be used to "fingerprint" proprietary datasets by tainting them ? For example, i want to make sure that my dataset is not stolen and used by someone else (Waymo?). So I taint it using an adversarial method and create a "canary test set" that will uniquely identify if my dataset has been used in some training.


In one sense, no. You can guarantee privacy of any given input (or any subset of k inputs) by applying transfer learning of an ensemble of models trained on subsets of the training data [0][1]. This is useful if, for instance, you train on medical data and you don't want anyone to know that "John Doe, HIV+" was part of the input. If your adversary does not take such precautions, however, then your canary should work.

[0] https://arxiv.org/abs/1610.05755

[1] https://static.googleusercontent.com/media/research.google.c...


I believe google maps has fake streets as a canary for their maps being scraped...


yup, in cartography these are known as trap streets


You don't even need an adversarial attack method to fingerprint or watermark a proprietary dataset. If the dataset consists of images, you could just watermark them, or mark them using steganography. The watermark would appear as random (non-targeted) noise, and would be mostly invisible to current classifiers. Please note that the noise employed by adversarial attacks is very different, it is highly targeted.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: