See the FAQ [0]: "Can I delete my account? We try not to delete entire account h...

MerelyMortal · on Nov 27, 2022

Then for the threads to be preserved, the content should remain, but the associated username should just be changed.

I imagine if HN created an account with the username "DELETED" or similar, that a script could just change comment ownership from the account to be deleted to the special "deleted" account - that would be the easiest to implement as well as keep thread continuity.

(Don't delete the comment just delete the connection to the user.)

sparky_z · on Nov 27, 2022

I don't have a good suggestion here, but I have to point out that that does not fully address OP's stated concern about the use of text analysis techniques to figure out who wrote what, even if the username is different.

(It does probably make those techniques more difficult since it would mix comments together from multiple authors under the "deleted" username, but it doesn't fully remove the danger.)

mumblemumble · on Nov 27, 2022

If all deleted accounts' usernames were replaced with "[deleted]", that would hypothetically do a pretty decent job of defeating text analysis techniques. A single post isn't really enough to characterize someone's writing style, and a sufficiently large pool of deleted accounts would make it quite difficult to reliably pick individuals out of the slush pile and group their comments together.

That said, HN is being archived and mirrored in I-don't-know-how-many-places, and I'm not sure how feasible it is to track all those places down and get them to expunge your userid, too. And this is all assuming nobody comes up with a new de-anonymization technique that deals with it well. That is a rather big assumption considering new ones are being developed all the time.

sixstringtheory · on Nov 27, 2022

I imagine the same analysis can be performed on other networks like reddit, twitter, github, linkedin etc to find matches amongst them all, [deleted] is a signal as well. If there’s a strong match across one or more of those and a deleted one here, or vice verse to rule out possible matches, and the others are not anonymized [well enough], then it could probably deanonymonize quite a few deleted accounts here.

I’m sure something like this is available to recruiters or other HR/business admin, I remember seeing browser extensions/SaaSes years ago that were trying to tie together social media identities.

hotpotamus · on Nov 27, 2022

For a few years now I've imagined an AI that can ingest all my writing across platforms, figure out it's me via this type of analysis, find any information I leak, and archive that data in perpetuity. Then it could be used to judge me for whatever purpose its owner deems worthy; which given my age will probably mean selling me boner pills in a decade or two.

It feels like we killed god and then re-invented him. And I think that if you don't want his gaze and judgement to fall on you, then your only option is not to participate in online discussion, and probably not even read it because you can probably learn a lot about someone just from passively tracking the things they follow online.

aatd86 · on Nov 27, 2022

I think it would decrease the Signal to Noise ratio sufficiently.

That would be different if responses hardcoded mentions of the username.

roflyear · on Nov 27, 2022

It does if you can't determine the accounts are the same. Right now HN does nothing to address these concerns. It is ridiculous anyone finds this acceptable.

lijogdfljk · on Nov 27, 2022

Really good point. You could estimate if a [deleted] comment might be from [account X], but you'd only have that one comment to compare. The rest might be from other accounts.

So yea, you'd probably end up with a pile of comments that are more likely linked to [account X], but many of them wouldn't actually be. It would add a ton of noise into the system.

AnimalMuppet · on Nov 27, 2022

You would get some hints. There have been situations where someone said "like AnimalMuppet said upthread..." or something like that. But they weren't very common. Maybe 1% of my comments could be definitely identified like that.

Is that enough to define a "style" to determine the rest of my comments? Is it enough to doxx me from the comments where someone else names my nick?

MerelyMortal · on Nov 27, 2022

If you took all the sentences out of all of the books in a library, and mixed the sentences together under one fake author name, I think it would be impossible to correctly attribute 99.99+% of the sentences to a correct author.

I think that is sufficient.

sparky_z · on Nov 27, 2022

But we're not going to delete everybody's account and mix them all together, are we?

If I took a bunch of "minor" writing and mixed it together under one fake name name, but some of it was written by a famous author under a pen-name, then yes, in theory it could be possible to identify those.

luckylion · on Nov 27, 2022

That's a very theoretical problem though, isn't it? Individual comments don't have the length of books, and usually don't individually contain enough text to be unique. Once the account-relationship is gone, it's essentially like splitting all the books up into paragraphs and trying to attribute individual paragraphs. Unless you're Wittgenstein or someone with a similar interest in exploring how long sentences can be, I doubt there's enough there.

Of course, all of that is hardly useful, since HN is very open and lots of people have copies of all comments.

MerelyMortal · on Nov 27, 2022

Okay, 1% of the library. I see threads here every-so-often asking about deletion, and I imagine there are more people who would like the opportunity but already know the answer and doesn't ask.

aliqot · on Nov 27, 2022

Curious what happens when an HN user inevitably wants their dead-name changed but retain their history, and whether that would be a harder path to march than being deleted for privacy reasons.

MerelyMortal · on Nov 27, 2022

I don't understand the question.

Edit: Thank you for the clarification (reply), I was not aware of the term "dead-name" refering to that. I still am not exactly sure what you are asking though. If a replier wrote the original username in an old reply that was written before the name change? In that extremely rare case, it might cause some confusion if a third person reads the old thread, in which case perhaps the user initiating the name change could use the search function for any instances of that, and then email the HN admin?

aliqot · on Nov 27, 2022

There is a high incidence of transgender people in our community. Often, names are changed, the old name is referred to as a dead-name. Addressing someone by this deprecated title is the height of disrespect.

jakelazaroff · on Nov 27, 2022

If you email hn@ycombinator.com, they'll change your username for you. (Near the bottom of https://news.ycombinator.com/newsfaq.html)

aliqot · on Nov 27, 2022

All of those comments were written from the perspective of a person who no longer 'exists' tangibly as they've transitioned.

MerelyMortal · on Nov 27, 2022

Then perhaps what the transitioning person is looking for, is just creating a new account to go with their new self?

aliqot · on Nov 27, 2022

I'm not one of those who are affected, so unfortunately I'm without a sound rebuttal. You do raise a good point. Given that I'm not, someone who is may have more perspective that'd support one assertion vs the other.

jll29 · on Nov 27, 2022

It should not be ONE account, or it wouldn't be possible to distinguish individual users in a conversational thread from one another:

DELETED_USER: I agree.

DELETED_USER: No, that's BS.

DELETED_USER: He has his point.

Better would be:

DELETED_USER_1: I agree.

DELETED_USER_2: No, that's BS.

DELETED_USER_3: He has his point.

threatofrain · on Nov 27, 2022

Then someone might use an archived version from the trivially scrapable API to recover the information. This is a site for hackers after all.

chrisandchris · on Nov 27, 2022

Probably web.archive has indexed one of your posts _before_ your account name changed to DELETED, so your change will be useless.

The internet never forgets.

MerelyMortal · on Nov 27, 2022

Possibly, it doesn't hurt to try to reduce the attack surface though.

A lot of the stuff I wrote on old popular, public websites (that still exist) can no longer be found via search engine, and I did not take any action, it just disappeared on its own.

shagie · on Nov 27, 2022

You can take the possibility out of it when you look at Google's big data. It's not a question of "if it exists out there" or "how often it is updated if it exists out there."

https://console.cloud.google.com/marketplace/details/y-combi... is updated daily.

MerelyMortal · on Nov 27, 2022

There is no guarantee that copy will always be available or that they won't remove data.

Just because there happens to be a copy already, doesn't mean that the original can't be removed to prevent others from making copies in the future.

Zababa · on Nov 27, 2022

It does actually. At least once a month I go on an multiple hours long quest to find an old thing, and frequently I can't find it.

counttheforks · on Nov 27, 2022

Then you send them GDPR right to be forgotten requests next. Alternatively, HN can force them to delist to the content.

Zababa · on Nov 27, 2022

4chan, that's usually fully anonymous, has on some boards threads IDs. You can see that this is the same perso in that thread, but can't link it to another person in another thread.

altairprime · on Nov 27, 2022

So long as the account is permanently locked from ever participating again in any way, I think renaming is great. Otherwise it needs to be left as-is.

But the comments need to remained owned by the account that created them, so that we can consider the history of each commenter when reviewing their comments in the future. Merging them all into DELETED inappropriately coalesces those histories.

I feel for those who are just now realizing that they can be located by their stylistic tendencies. It’s Dejanews all over again.

philwelch · on Nov 27, 2022

That still doesn’t account for the stylometry.

MerelyMortal · on Nov 27, 2022

Yes it does, not everyone gets a unique "deleted" account. Everyone's comment gets attributed to a single special "deleted" account.

Operyl · on Nov 27, 2022

The problem still remains: HN’s “API” is incredibly simple and people have full datasets downloaded locally for every comment. In this case, the OP is already out of luck if he’s looking for anonymity against a hostile entity.

MerelyMortal · on Nov 27, 2022

That problem seems like an extreme outlier. Such user protection would prevent "crimes of opportunity". The average person is not going to have a constant backup of HN in case one day they might want to spy on someone's past.

Operyl · on Nov 27, 2022

The problem doesn’t appear to be that much of an extreme outlier, the thread poster is concerned about a specific tool. That tool has already downloaded the complete data set, he’s already lost.

MerelyMortal · on Nov 27, 2022

And there's no guarantee that service will stay around, or that they won't accept requests. I still think it's worthwhile to reduce the attack surface.

AnimalMuppet · on Nov 27, 2022

And if they do, they're likely to start with Twitter or Facebook - something useful against more of the population. HN users are still very much a minority.

naniwaduni · on Nov 27, 2022

Nothing going forward can help in that case, but we can still weigh impact on other threat models going forward.

juancb · on Nov 27, 2022

"We refuse to help because it's a mild inconvenience to us and we'll justify it by assuming that it won't help without knowing for a fact that it won't."

That's a fascinating stance that you've outlined and that others have parroted. A stance that HN has implied with the reply to OP.

Operyl · on Nov 28, 2022

I’m not in charge of anything at HN, so how can my statement say HN won’t do something?

sparky_z · on Nov 27, 2022

Not completely. For example, if the analyst has a large corpus from someone's main account to build up a profile, it seems plausible to me that they could identify individual comments under the "deleted" user as being written by the same person using a throwaway account, especially if they have a distinctive writing style.

almostnormal · on Nov 27, 2022

I doubt a single post provides a sufficient amount of information for that.

almostnormal · on Nov 27, 2022

Or a different deleted_user_<random or hash of post> for every post.

But as the sibling post says... it doesn't solve anything.

philwelch · on Nov 27, 2022

So you run the stylometric analysis on each comment to cluster them into inferred user profiles.

MerelyMortal · on Nov 27, 2022

I'm okay with someone attempting to do that. I imagine it would be extremely fuzzy and not successful.

matrix_overload · on Nov 27, 2022

It does if the content from ALL deleted users gets merged under the same metauser. Stylometric average of all deleted users' comments is pretty useless.

chinathrow · on Nov 27, 2022

I disagree.

If a user asks for their comment to be deleted, then the right thing to do is to delete them. Period.

andybak · on Nov 27, 2022

I respectfully disagree that it's always the right thing to do. Outside of issues of safety, I think the balance shifts.

Safety issues are a different matter of course

pbhjpbhj · on Nov 27, 2022

What makes it the right thing to do?

chinathrow · on Nov 28, 2022

Just imagine yourself in the position having something to be deleted posted by an earlier you and/or about yourself.

If people ask nicely, can confirm ownership/authorship of the record/data in question, then why would you be in a position to deny such a request?

fsflover · on Nov 28, 2022

You imply that the (unconfirmed) harm to the individual is more important to avoid than the harm to the society done by removing potentially insightful comment thread. Why is that?

chinathrow · on Nov 28, 2022

Not sure - but your question implies that there is harm to the society done, and from your first part of the sentence, isn't that also unconfirmed?

fsflover · on Nov 28, 2022

I think it's clear that removing discussions from HN harms everyone, because there are many interesting ideas here.

wannabeanon · on Nov 27, 2022

I saw that but I’m looking for an automated approach that doesn’t involve email, which increases the surface area for doxxing. I feel like this is a pretty humble request, HN is the only site I can think of that doesn’t let users delete their own data. The world is a much different place then it was when HN was founded and it seems like this feature would be important to many people.

rahimnathwani · on Nov 27, 2022

"doesn’t let users delete their own data"

I consider my HN comments to be contributions to the HN community. I received some benefit in return for those comments, e.g. responses that improve my thinking.

I may retain copyright over those comments, but by posting them on a public forum I've given that forum licence to publish them.

9wzYQbTYsAIc · on Nov 27, 2022

This is the full license you give for posting to HN:

“By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed.” [1]

[1] https://www.ycombinator.com/legal/

rahimnathwani · on Nov 27, 2022

That page links to another page for California residents, which includes:

Exercising Your Rights: California residents can exercise the right to request deletion of Personal Information by contacting us at hn@ycombinator.com.

asdff · on Nov 27, 2022

It almost reads like this forum is designed to be a training set

9wzYQbTYsAIc · on Nov 27, 2022

I just assume that I’m contributing to a variety of machine learning efforts when I post on HN.

avereveard · on Nov 27, 2022

Irregardless of your license the site need to comply with local regulations. In gdpr especially consent might be retracted at any time regardless of whether the consent was given or not at the time.

yjftsjthsd-h · on Nov 27, 2022

> an automated approach that doesn’t involve email, which increases the surface area for doxxing.

Under what threat model does it meaningfully increase the surface area? If you're worried about HN admins then I think email is the least of your concerns (I'm pretty sure they can see your IP address), and if you're worried about the general public then your email isn't being leaked to them so it shouldn't matter.

peruvian · on Nov 27, 2022

HN is put together with lisp-flavored duct tape. It is not reddit or a Discourse forum with robust admin tools.

I had to email dang to change my username (quick response btw!).

Beldin · on Nov 27, 2022

While [0] doesn't come across as GDPR-compatible to me (not a lawyer), the further explanation in [1] sounds a lot more compatible with it.

Basically, HN will work with a requester to update the site to give the desired amount of anonymity whilst preserving history as much as possible with those limitations -- including editing past comments.

Full GDPR compatability would probably require to support complete removal of user name and comment/submission contents as written - but even that seems on the table in [1]. (DanG could simply summarise each comment worth multiple replies and delete all the ones without replies.)

squiffsquiff · on Nov 27, 2022

Whilst the intention may be admirable, it doesn't look like this would be compliant with the GDPR right to be forgotten which applies to any natural person who can be identified.

ilyt · on Nov 27, 2022

So not GDPR compliant ?

hijodelsol · on Nov 27, 2022

To my understanding they would still be GDPR-compliant if they delete your data upon receiving an email that you would like to exercise this right under GDPR, even if they don't automate that process but IANAL. Perhaps someone can confirm whether this has in fact worked for them in the past.

pifm_guy · on Nov 27, 2022

There is no requirement to automate GDPR requests.

However all organisations must be able to handle GDPR requests via any communication channel. Eg. They need to treat a data deletion request sent via twitter DM as a valid request if they have an official Twitter presence.

It is insufficient to require the customer fill out a special web form.

misnome · on Nov 27, 2022

Isn’t it all organisations _that do business_ in the EU? Since this is a free forum with no paid features, I wonder if it would be excluded?

chriswait · on Nov 27, 2022

IANAL but I don't think it matters whether the purpose of collection is specifically to facilitate paid features. From the European Commission:

> The GDPR applies to: [...] > 2. a company established outside the EU and is offering goods/services (paid or for free) or is monitoring the behaviour of individuals in the EU.

Assuming account names or the content of comments constitute personal data within GDPR, I think YCombinator falls into this group.

Edit: I forgot HN collects an optional email address too, which is definitely personal data.

Details here: https://www.ycombinator.com/legal/#:~:text=Hacker%20News%20I...

wrs · on Nov 27, 2022

The GDPR applies to the data of people residing in the EU. The location and profitability of the organization collecting the data isn’t a factor. (Though it may introduce questions of enforcement.)

ldjb · on Nov 27, 2022

Although many large websites and services allow you to request erasure of your data in an automated way, this is not required by GDPR.

GDPR allows individuals to request erasure verbally or in writing, and the data controller than has one month to respond.