We try not to delete entire account histories because that would gut the threads the account had participated in. However, we care about protecting individual users and take care of privacy requests every day, so if we can help, please email hn@ycombinator.com. We don't want anyone to get in trouble from anything they posted to HN. More here [1]."
Then for the threads to be preserved, the content should remain, but the associated username should just be changed.
I imagine if HN created an account with the username "DELETED" or similar, that a script could just change comment ownership from the account to be deleted to the special "deleted" account - that would be the easiest to implement as well as keep thread continuity.
(Don't delete the comment just delete the connection to the user.)
I don't have a good suggestion here, but I have to point out that that does not fully address OP's stated concern about the use of text analysis techniques to figure out who wrote what, even if the username is different.
(It does probably make those techniques more difficult since it would mix comments together from multiple authors under the "deleted" username, but it doesn't fully remove the danger.)
If all deleted accounts' usernames were replaced with "[deleted]", that would hypothetically do a pretty decent job of defeating text analysis techniques. A single post isn't really enough to characterize someone's writing style, and a sufficiently large pool of deleted accounts would make it quite difficult to reliably pick individuals out of the slush pile and group their comments together.
That said, HN is being archived and mirrored in I-don't-know-how-many-places, and I'm not sure how feasible it is to track all those places down and get them to expunge your userid, too. And this is all assuming nobody comes up with a new de-anonymization technique that deals with it well. That is a rather big assumption considering new ones are being developed all the time.
I imagine the same analysis can be performed on other networks like reddit, twitter, github, linkedin etc to find matches amongst them all, [deleted] is a signal as well. If there’s a strong match across one or more of those and a deleted one here, or vice verse to rule out possible matches, and the others are not anonymized [well enough], then it could probably deanonymonize quite a few deleted accounts here.
I’m sure something like this is available to recruiters or other HR/business admin, I remember seeing browser extensions/SaaSes years ago that were trying to tie together social media identities.
For a few years now I've imagined an AI that can ingest all my writing across platforms, figure out it's me via this type of analysis, find any information I leak, and archive that data in perpetuity. Then it could be used to judge me for whatever purpose its owner deems worthy; which given my age will probably mean selling me boner pills in a decade or two.
It feels like we killed god and then re-invented him. And I think that if you don't want his gaze and judgement to fall on you, then your only option is not to participate in online discussion, and probably not even read it because you can probably learn a lot about someone just from passively tracking the things they follow online.
It does if you can't determine the accounts are the same. Right now HN does nothing to address these concerns. It is ridiculous anyone finds this acceptable.
Really good point. You could estimate if a [deleted] comment might be from [account X], but you'd only have that one comment to compare. The rest might be from other accounts.
So yea, you'd probably end up with a pile of comments that are more likely linked to [account X], but many of them wouldn't actually be. It would add a ton of noise into the system.
You would get some hints. There have been situations where someone said "like AnimalMuppet said upthread..." or something like that. But they weren't very common. Maybe 1% of my comments could be definitely identified like that.
Is that enough to define a "style" to determine the rest of my comments? Is it enough to doxx me from the comments where someone else names my nick?
If you took all the sentences out of all of the books in a library, and mixed the sentences together under one fake author name, I think it would be impossible to correctly attribute 99.99+% of the sentences to a correct author.
But we're not going to delete everybody's account and mix them all together, are we?
If I took a bunch of "minor" writing and mixed it together under one fake name name, but some of it was written by a famous author under a pen-name, then yes, in theory it could be possible to identify those.
That's a very theoretical problem though, isn't it? Individual comments don't have the length of books, and usually don't individually contain enough text to be unique. Once the account-relationship is gone, it's essentially like splitting all the books up into paragraphs and trying to attribute individual paragraphs.
Unless you're Wittgenstein or someone with a similar interest in exploring how long sentences can be, I doubt there's enough there.
Of course, all of that is hardly useful, since HN is very open and lots of people have copies of all comments.
Okay, 1% of the library. I see threads here every-so-often asking about deletion, and I imagine there are more people who would like the opportunity but already know the answer and doesn't ask.
Curious what happens when an HN user inevitably wants their dead-name changed but retain their history, and whether that would be a harder path to march than being deleted for privacy reasons.
Edit: Thank you for the clarification (reply), I was not aware of the term "dead-name" refering to that. I still am not exactly sure what you are asking though. If a replier wrote the original username in an old reply that was written before the name change? In that extremely rare case, it might cause some confusion if a third person reads the old thread, in which case perhaps the user initiating the name change could use the search function for any instances of that, and then email the HN admin?
There is a high incidence of transgender people in our community. Often, names are changed, the old name is referred to as a dead-name. Addressing someone by this deprecated title is the height of disrespect.
I'm not one of those who are affected, so unfortunately I'm without a sound rebuttal. You do raise a good point. Given that I'm not, someone who is may have more perspective that'd support one assertion vs the other.
Possibly, it doesn't hurt to try to reduce the attack surface though.
A lot of the stuff I wrote on old popular, public websites (that still exist) can no longer be found via search engine, and I did not take any action, it just disappeared on its own.
You can take the possibility out of it when you look at Google's big data. It's not a question of "if it exists out there" or "how often it is updated if it exists out there."
4chan, that's usually fully anonymous, has on some boards threads IDs. You can see that this is the same perso in that thread, but can't link it to another person in another thread.
So long as the account is permanently locked from ever participating again in any way, I think renaming is great. Otherwise it needs to be left as-is.
But the comments need to remained owned by the account that created them, so that we can consider the history of each commenter when reviewing their comments in the future. Merging them all into DELETED inappropriately coalesces those histories.
I feel for those who are just now realizing that they can be located by their stylistic tendencies. It’s Dejanews all over again.
The problem still remains: HN’s “API” is incredibly simple and people have full datasets downloaded locally for every comment. In this case, the OP is already out of luck if he’s looking for anonymity against a hostile entity.
That problem seems like an extreme outlier. Such user protection would prevent "crimes of opportunity". The average person is not going to have a constant backup of HN in case one day they might want to spy on someone's past.
The problem doesn’t appear to be that much of an extreme outlier, the thread poster is concerned about a specific tool. That tool has already downloaded the complete data set, he’s already lost.
And there's no guarantee that service will stay around, or that they won't accept requests. I still think it's worthwhile to reduce the attack surface.
And if they do, they're likely to start with Twitter or Facebook - something useful against more of the population. HN users are still very much a minority.
"We refuse to help because it's a mild inconvenience to us and we'll justify it by assuming that it won't help without knowing for a fact that it won't."
That's a fascinating stance that you've outlined and that others have parroted. A stance that HN has implied with the reply to OP.
Not completely. For example, if the analyst has a large corpus from someone's main account to build up a profile, it seems plausible to me that they could identify individual comments under the "deleted" user as being written by the same person using a throwaway account, especially if they have a distinctive writing style.
It does if the content from ALL deleted users gets merged under the same metauser. Stylometric average of all deleted users' comments is pretty useless.
You imply that the (unconfirmed) harm to the individual is more important to avoid than the harm to the society done by removing potentially insightful comment thread. Why is that?
I saw that but I’m looking for an automated approach that doesn’t involve email, which increases the surface area for doxxing. I feel like this is a pretty humble request, HN is the only site I can think of that doesn’t let users delete their own data. The world is a much different place then it was when HN was founded and it seems like this feature would be important to many people.
I consider my HN comments to be contributions to the HN community. I received some benefit in return for those comments, e.g. responses that improve my thinking.
I may retain copyright over those comments, but by posting them on a public forum I've given that forum licence to publish them.
This is the full license you give for posting to HN:
“By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed.” [1]
That page links to another page for California residents, which includes:
Exercising Your Rights: California residents can exercise the right to request deletion of Personal Information by contacting us at hn@ycombinator.com.
Irregardless of your license the site need to comply with local regulations. In gdpr especially consent might be retracted at any time regardless of whether the consent was given or not at the time.
> an automated approach that doesn’t involve email, which increases the surface area for doxxing.
Under what threat model does it meaningfully increase the surface area? If you're worried about HN admins then I think email is the least of your concerns (I'm pretty sure they can see your IP address), and if you're worried about the general public then your email isn't being leaked to them so it shouldn't matter.
While [0] doesn't come across as GDPR-compatible to me (not a lawyer), the further explanation in [1] sounds a lot more compatible with it.
Basically, HN will work with a requester to update the site to give the desired amount of anonymity whilst preserving history as much as possible with those limitations -- including editing past comments.
Full GDPR compatability would probably require to support complete removal of user name and comment/submission contents as written - but even that seems on the table in [1]. (DanG could simply summarise each comment worth multiple replies and delete all the ones without replies.)
Whilst the intention may be admirable, it doesn't look like this would be compliant with the GDPR right to be forgotten which applies to any natural person who can be identified.
To my understanding they would still be GDPR-compliant if they delete your data upon receiving an email that you would like to exercise this right under GDPR, even if they don't automate that process but IANAL. Perhaps someone can confirm whether this has in fact worked for them in the past.
There is no requirement to automate GDPR requests.
However all organisations must be able to handle GDPR requests via any communication channel. Eg. They need to treat a data deletion request sent via twitter DM as a valid request if they have an official Twitter presence.
It is insufficient to require the customer fill out a special web form.
IANAL but I don't think it matters whether the purpose of collection is specifically to facilitate paid features. From the European Commission:
> The GDPR applies to:
[...]
> 2. a company established outside the EU and is offering goods/services (paid or for free) or is monitoring the behaviour of individuals in the EU.
Assuming account names or the content of comments constitute personal data within GDPR, I think YCombinator falls into this group.
Edit: I forgot HN collects an optional email address too, which is definitely personal data.
The GDPR applies to the data of people residing in the EU. The location and profitability of the organization collecting the data isn’t a factor. (Though it may introduce questions of enforcement.)
"Can I delete my account?
We try not to delete entire account histories because that would gut the threads the account had participated in. However, we care about protecting individual users and take care of privacy requests every day, so if we can help, please email hn@ycombinator.com. We don't want anyone to get in trouble from anything they posted to HN. More here [1]."
[0] https://news.ycombinator.com/newsfaq.html
[1] https://news.ycombinator.com/item?id=23623799