Censorship is always abused. Maybe this time or maybe not. But it's always abuse...

ineedasername · on Sept 11, 2021

I'm not sure it's worse than misinformation. In my field, bad data often has a more damaging impact than no data.

But I suppose it will depend on the circumstances, and I'd honestly be interested to hear your thoughts on why censorship is worse.

As for the inevitability of abuse? When it comes to corporate interests, that seems to be nearly axiomatic. The Verge's list of fascinating & horrifying exchange at Apple about app approvals & secret deals makes for a great case-study in this. [0]

[0] https://www.theverge.com/22611236/epic-v-apple-emails-projec...

chmod600 · on Sept 11, 2021

Censorship is bad data, because it is selectively excluded data.

If gamma rays randomly excluded one post in a thousand, that would be mussing data. Censors excluding one post in ten thousand is worrying because they have motivations of their own, which gamma rays do not.

lamontcg · on Sept 11, 2021

It looks like we're going to get a massive test of largely misinformation (US) vs. largely censorship (China) writ large in the coming decades. Place your bets on the outcome.

dredmorbius · on Sept 11, 2021

From my understanding, China's model of media control focuses more on dillution and distraction than on overt censorship.

Both exist. But the larger effort is put into distraction.

The recent Russian model is more on bullshit and subverting notions of trust entirely.

American propaganda seems largely based on a) what sells and b) promoting platitudes, wishful thinking, and c) (at least historically) heart-warming (rather than overtly divisive) notions of nationalism.

The c) case is now trending more toward divisive and heat-worming.

unityByFreedom · on Sept 12, 2021

> Both exist.

Yes, censorship and propaganda go hand in hand. In 1922 Walter Lippmann wrote in his seminal work, Public Opinion,

> Without some form of censorship, propaganda in the strict sense of the word is impossible. In order to conduct a propaganda there must be some barrier between the public and the event. [1] [2]

[1] https://www.gutenberg.org/cache/epub/6456/pg6456.html

[2] https://en.wikipedia.org/wiki/Public_Opinion_(book)#News_and...

dredmorbius · on Sept 12, 2021

Right.

Both are also tied inherently to monopoly, along with surveillance and both general and targeted manipulation.

https://joindiaspora.com/posts/7bfcf170eefc013863fa002590d8e...

https://news.ycombinator.com/item?id=24771470

jjcon · on Sept 11, 2021

This isn't 'some bad data' vs 'no data'

This is 'some bad data' vs 'systemically biased data' and the latter is much worse. Most datasets will contain some bad data but it can be worked around because the errors are random.

ipaddr · on Sept 11, 2021

Bad data vs no data at all? I would think no data would put you out of a job while bad data would require more hires to filter the data.

klyrs · on Sept 11, 2021

I prefer employees who say "I don't know" over confident bullshitters.

dredmorbius · on Sept 11, 2021

No data clearly indicates that there is no data.

A statement of "I don't know' clearly indicates a lack of knowledge.

A statemnt of "I have no opinion" clearly indicates that the speaker has not formed an opinion.

In each case, a spurious generated response:

1. Is generally accepted as prima facie evidence of what it purports.

2. Must be specifically analysed and assessed.

3. Is itself subject to repetition and/or amplification. With empirical evidence suggesting that falsehoods outcompete truths, particularly on large networks operating at flows which overload rational assessment.

4. Competes for attention with other information, including the no-signal case specifically, which does very poorly against false claims as it is literally nothing competing against an often very loud something.

Yes: bad data is much, much, much, much worse than no data.

inkblotuniverse · on Sept 11, 2021

Data that's had data censored from it is bad data.

dredmorbius · on Sept 11, 2021

False.

Outlier exclusion is standard practice.

It's useful to note what is excluded. But you exclude bad data from the analysis.

Remember that what you're interested in is not the data but the ground truth that the data represent. This means that the full transmission chain must be reliable and its integrity assured: phenomenon, generated signal, transmission channel, receiver, sensor, interpretation, and recording.

Noise may enter at any point. And that noise has ... exceedingly little value.

Deliberately inserted noise is one of the most effective ways to thwart an accurate assessment of ground truths.

ineedasername · on Sept 12, 2021

Defining terms here is important, so let's avoid the word bad for a moment because it can be applied in different ways.

1) You can have an empty dataset.

2) You can have an incomplete dataset.

3) you can have a dataset where the data is wrong

All of these situations, in some sense, are "bad"

What I'm saying is that, going into a situation, my preference would be #2 > #1 > #3.

Because I always assume a dataset could be incomplete, that it didn't capture everything. I can plan for it, look for evidence that something is missing, try to find it. If I suspect something is missing but can't find it then I at least know that much, and maybe even the magnitude of uncertainty that adds to the situation. Either way, I can work around it understanding the limits if what I'm doing or if there's too much missing, make a judgement call and say that nothing useful can be done with it.

If I have what appears to be a dataset that I can work with, but the data is all incorrect, I may never even know it until things start to break or, before that if I'm lucky, I waste large amounts of time to find out that the results just don't make sense.

It's probably important to note that #2 and #3 are also not mutually exclusive. Getting out of the dry world of data analysis, if your job is propaganda & if you're good at your job, #2 and #3 combined is where you're at.

inkblotuniverse · on Sept 12, 2021

I'd argue Facebook's censorship leaves us with 2 and 3. They don't remove things bevause they're wrong; they remove them because they go against the current orthodoxy. Most things are wrong, so most things that go against the modern orthodoxy are wrong... but wrong things that go WITH the modern orthodoxy aren't removed.

It's a scientist who removes outliers in the direction that refute his ideas, but not ones in the direction that support it.

dredmorbius · on Sept 12, 2021

Let's note that this thread's been shifting back and forth between information which is publicised over media and data, with the discussion focusing on use in research.

These aren't entirely dissimilar, but they have both similarities and differences.

Data in research is used to confirm or deny models, that is, understandings of the world.

Data in operations is used to determine and shape actions (including possibly inaction), interacting with an environment.

Information in media ... shares some of this, but is more complex in that it both creates (or disproves) models, and has a very extensive behavioural component involving both individual and group psychology and sociology.

Media platform moderation plays several roles. In part, it's performed in the context that the platforms are performing their own selection and amplification, and that there's now experimental evidence that even in the absence of any induced bias, disinformation tends to spread especially in large and active social networks.

(See "Information Overload Helps Fake News Spread, and Social Media Knows It". (https://www.scientificamerican.com/article/information-overl...), discussed here https://news.ycombinator.com/item?id=28495912 and https://news.ycombinator.com/item?id=25153716)

The situation is made worse when there's both intrinsic tooling of the system to boost sensationalism (a/k/a "high engagement" content), and deliberate introduction of false or provocative information.

TL;DR: moderation has to compensate and overcome inherent biases for misinformation, and take into consideration both causal and resultant behaviours and effects. At the same time, moderation itself is subject to many of the same biases that the information network as a whole is (false and inflammatory reports tend to draw more reports and quicker actions), as well as spurious error rates (as I've described at length above).

All of which is to say that I don't find your own allegation of an intentional bias, offered without evidence or argument, credible.

ineedasername · on Sept 13, 2021

An excellent distinction. In the world of data with research & operations, I only very rarely deal with data that is intentionally biased. Counted on the fingers of my hand. Cherry picked is more common, but intentionally wrong to present things in a different light, that's rare.

Well, it's rare that I know of. The nature of things is that I might never know. But most people that don't work with data as a profession also don't know how to create convincingly fake data, or even cherry pick without leaving the holes obvious. Saying "Yeah, so I actually need all of the data" isn't too uncommon. Most of the time it's not even deliberate, people just don't understand that their definition of "relevant data" isn't applicable. Especially when I'm using it to diagnose a problem with their organization/department/etc.

Propaganda... Well, as you said there's some overlap in the principles. Though I still stand by more preference of #2 > #1 > #3. And #3 > 2&3 together.

rhaksw · on Sept 13, 2021

Does your research data include moderator actions? I imagine such data may be difficult to gather. On reddit it's easy since most groups are public and someone's already collected components for extracting such data [1].

I show some aggregated moderation history on reveddit.com e.g. r/worldnews [2]. Since moderators can remove things without users knowing [3], there is little oversight and bias naturally grows. I think there is less bias when users can more easily review the moderation. And, there is research that suggests if moderators provide removal explanations, it reduces the likelihood of that user having a post removed in the future [4]. Such research may have encouraged reddit to display post removal details [5] with some exceptions [6]. As far as I know, such research has not yet been published on comment removals.

[1] https://www.reddit.com/r/pushshift/

[2] https://www.reveddit.com/v/worldnews/history/

[3] https://www.reveddit.com/about/faq/#need

[4] https://www.reddit.com/r/science/comments/duwdco/should_mode...

[5] https://www.reddit.com/r/changelog/comments/e66fql/post_remo...

[6] https://www.reveddit.com/about/faq/#reddit-does-not-say-post...

dredmorbius · on Sept 13, 2021

Data reliability is highly dependent on the type of data you're working with, and the procedures, processes, and checks on that.

I've worked with scientific, engineering, survey, business, medical, financial, government, internet ("web traffic" and equivalents), and behavioural data (e.g., measured experiences / behavour, not self-reported). Each has ... its interesting quirks.

Self-reported survey data is notoriously bad, and there's a huge set of tricks and assumptions that are used to scrub that. Those insisting on "uncensored" data would likely scream.

(TL;DR: multiple views on the same underlying phenomenon help a lot --- not necessarily from the same source. Some will lie, but they'll tend to lie differently and in somewhat predictable ways.)

Engineering and science data tend to suffer from pre-measurement assumptions (e.g., what you instrumented for vs. what you got. "Not great. Not terrible" from the series Chernobyl is a brilliant example of this (the instruments simply couldn't read the actual amount of radiation).

In online data, distinguishing "authentic" from all other traffic (users vs. bots) is the challenge. And that involves numerous dark arts.

Financial data tends to have strong incentives to provide something, but also a strong incentive to game the system.

I've seen field data where the interests of the field reporters outweighed the subsequent interest of analysts, resulting in wonderfully-specified databases with very little useful data.

Experiential data are great, but you're limited, again, to what you can quantify and measure (as well has having major privacy and surveillance concerns, often other ethical considerations).

Government data are often quite excellent, at least within competent organisations. For some flavour of just how widely standards can vary, though, look at reports of Covid cases, hospitalisations, recoveries, and deaths from different jurisdictions. Some measures (especially excess deaths) are far more robust, though they also lag considerably from direct experience. (Cost, lag, number of datapoints, sampling concerns, etc., all become considerations.)

It's complicated.

ineedasername · on Sept 13, 2021

I've worked with a decent variety as well, though nothing close to engineering.

>Self-reported survey data is notoriously bad

This is my least favorite type of data to work with. It can be incorrect either deliberately or through poor survey design. When I have to work with surveys I insist that they tell me what they want to know, and I design it. Sometimes people come to me when they already have survey results, and sometimes I have to tell them there's nothing reliable that I can do with to. When I'm involved from the beginning, I have final veto. Even then I don't like it. Even a well designed survey with proper phrasing, unbiased likert scales, etc can have issues. Many things don't collapse nicely to a one-dimensional scale. Then there is the selection bias inherent when by definition you only receive responses from people willing to fill out the survey. There are ways to deal with that, but they're far from perfect.

dredmorbius · on Sept 14, 2021

Q: What's the most glaring sign of a failed survey analysis project?

A: "I've conducted a survey and need a statistician to analyse it for me."

(I've seen this many, many, many times. I've never seen it not be the sign of a completely flawed aproach.)

judge2020 · on Sept 11, 2021

Bad data is often taken as good data, because sifting through it incurs 100x more friction than taking it at face value. When you ultimately get bad results you can just blame the bad data, and you still end up with a paycheck for the month(s) you wasted.

ALittleLight · on Sept 11, 2021

As a metaphor, you can imagine a blind person in the wilderness who has no idea what is in front of him. He will proceed cautiously, perhaps probing the ground with a stick or his foot. You could also imagine a delusional man in the same wilderness incorrectly believing he's in the middle of a foot race. The delusional man just run forward at full speed. If the pair are in front of a cliff...

As the saying goes, it's not what you don't know that gets you into trouble. It's what you know for sure that just ain't so.

ineedasername · on Sept 12, 2021

Not quite: if you have no data, you get new hires and news systems to collect and track it.

You may be ignorant, but you know it, and can deal with it. Let's call is starting from 0.

When you have bad data, you frequently don't know that you have bad data until things go very very wrong. You aren't starting from 0. 0 would be an improvement.

dredmorbius · on Sept 12, 2021

This seems like extending the "known knowns" concept to an additional dimension, involving truth.

In the known-knowns model, you have knowledge and metaknowledge (what you know, what you know you know):

     K   U   -- What you know
  K  KK  KU

  U  UK  UU
   \
    What you know you know

If we add truth to that, you end up with a four-dimensional array with dimensions of knowledge, knowledge of knowledge, truth-value, and knowledge-of-truth-value. Rather than four states, there are now 16:

         TT   TF   FT   FF   (Truth & belief of truth)
        ---- ---- ---- ----
  KK  | KKTT KKTF KKFT KKFF
  KU  | KUTT KUTF KKFT KKFF
  UK  | UKTT UKTF UKFT UKFF
  UU  | UUTT UUTF UUFT UUFF

False information is the FT and FF columns.

In both the TF and FT columns, belief of the truth-value of data is incorrect.

In both the KU and UU columns, there is a lack of knowledge (e.g., ignorance), either known or unknown.

(I'm still thinking through what the implications of this are. Mapping it out helps structure the situation.)

ineedasername · on Sept 13, 2021

Reminds me of epistemology, where this is a distinction between certainty and truth.

dredmorbius · on Sept 13, 2021

Yes, there's an element of that in this.

kevin_thibedeau · on Sept 11, 2021

Authoritarian countries collapse because everyone is lying about reality. Same thing happens with metric driven management.

TulliusCicero · on Sept 11, 2021

Any platform where people can speak to an audience needs some kind of 'censorship', otherwise you'll quickly find it's a platform solely for trolls and the like.

rgrieselhuber · on Sept 11, 2021

Censorship is the sloppiest possible solution to the epistemological crisis. I thought we figured this out during the Enlightenment.

titzer · on Sept 11, 2021

Censorship, meet Filter Bubble. Filter Bubble, Censorship. And this little tike you've got with you, what's his name? Engagement Metric? Oh, how cute. Nice to meet you. You look innocent. I'm guessing you couldn't do any major societal damage at all. You're certainly not a little problem child.

Noumenon72 · on Sept 11, 2021

Filter bubble is not a real problem:

https://twitter.com/degenrolf/status/1261164727486615559?lan...

https://twitter.com/degenrolf/status/1067780924014772224

Whereas censorship is lindy among things that have bad effects on society.

https://en.wikipedia.org/wiki/Lindy_effect

So give the most caution against the proven bad thing and not the one you're in a trendy moral panic about.

titzer · on Sept 11, 2021

So Rolf Degen argues that filter bubbles aren't a problem, and if you think they are, then it's because you're a "political junkie" trapped in one. That's some pretty twisted logic mixed with a nice helping of poisoning the well. I guess political junkies would never do anything crazy like assault the capital building to prevent certification of an election result. Yep, nothing to see here.

I'm going to adopt this style of argument from now on.

"Oh, you think that X is a big problem? Well, it isn't, because you have problem X, and only think that way because of it! It's your cognitive distortions talking! Zing!"

inkblotuniverse · on Sept 11, 2021

A couple of boomers went on an unguided tour, unarmed. I didn't know insurrectionists tended to leave their guns at home

ribosometronome · on Sept 11, 2021

Almost every photo I've seen of the event has not been "a couple of" anyone nor largely "boomers".

On a similar note, I somehow doubt if people broke through the doors to enter your home, assaulted people trying to protect it, yelled about how they want you dead, and then took some of your stuff you'd be calling it an "unguided tour".

chmod600 · on Sept 11, 2021

The question is: do we want to learn the lessons of history the easy way, or the hard way?

whatshisface · on Sept 11, 2021

New people are being born every day who weren't around for the steps forward made in the past. If only there was an institution that could step up to the task of teaching them. Instead there are institutions for getting them to buy toys and making them do algebra drills.

rgrieselhuber · on Sept 11, 2021

It's a really hard problem to solve. History has shown time and again how easy it is to coopt institutions as well.

wussboy · on Sept 11, 2021

If my kids were late coming home from school, and I asked you if you know what has happened, and you either:

1. Don't say anything because my neighbour tapes your mouth shut

2. Lie and say, "They were brutally murdered by your neighbour", resulting in a dead neighbour followed by my kids showing up unharmed from school

...can you explain in this scenario how censorship is worse than misinformation.

I'm not trying do be a jerk. I hear your argument a lot (especially on tech-heavy web sites) and I want to understand it.

rglullis · on Sept 11, 2021

I think there is a tiny little bit of a jump if you are acting this quickly and this harshly on information without verifying.

Concretely to your hypothetical: don't attribute to misinformation the issue that is most like your barbaric reaction. Not to say that the liar should not be punished, it should bear a big responsibility in the consequences of the actions. But at the end of day it was not the liar the one that killed your neighbor, you were.

ribosometronome · on Sept 11, 2021

It would be, but also you showed him pictures to prove it, he just didn't know they were photoshopped. And linked him to a news article on thebostontribune.com that was reporting that his kids were dead. And his family and friends were sharing their condolences.

It's not as if folk AREN'T acting on misinformation or showing that they aren't really capable of distinguishing between the two. Tons can. And tons won't realize that The Boston Tribune isn't real.

We're having to deal with almost literally shouting "fire!" in a crowded theater when there's no fire, only there's special effects and major campaigns to convince people there's fire, not just taking some guy at their word and stampeding because of it.

rglullis · on Sept 11, 2021

This seems like really stretching the analogy just to remove personal responsibility.

If I am the father of the missing children and I see the "family and friends" sharing their condolences, I would go talk to them first. If someone comes with pictures trying to accuse someone of something, no matter how shocking the accusations, there would still be the question of (a) why is someone bothering with taking pictures and not taking to the authorities beforehand and (b) what are the consequences for me if I went on a rampage attack based on bogus evidence.

To get a little bit on topic: the reason that censorship is worse than misinformation is that we should always operate on the premise that our information is incomplete, inaccurate or distorted by those controlling the information channels.

Without censorship, I can listen to different sources (no matter how crazy or unsound they are) and I can try to discern what makes sense and does not. With censorship, any dissent is silenced, so we get one source of information - who can never get questioned - or worse we get to see many sources of information but only the ones that are aligned with the censors and gives us a false consensus and the illusion of quality in information.

Only idiots can walk around in the world of today and confidently repeat whatever they hear from "official" sources as unquestionable truths.

wussboy · on Sept 11, 2021

Thanks for your reply, rlgullis.

The extremes of my example were only to show that there could be real and serious consequences from misinformation rather than silence. If we dial it back from "killing my neighbour" to "lost my job" or even "missed my bus", I believe my point still stands. In many scenarios that we experience every day, we would be better served by accepting censure over misinformation.

You claim "we should always operate on the premise that our information is incomplete, inaccurate or distorted by those controlling the information channels" and I agree with you in theory. But in practice this is impossible. The human brain is physically unable to work everything through from first principles. This makes sense conceptually and has been verified in research.

And this to me is the fundamental issue of our time:

In theory, social media and unrestrained free speech are a boon for all society.

In practice they have turned people against each other with very real and serious consequences.

rglullis · on Sept 12, 2021

> In many scenarios that we experience every day, we would be better served by accepting censure over misinformation.

No. Not at all. I refuse your premise. Not only you are begging the question here (what scenarios? Your example was terrible and I really don't think you can come up with a good one), I honestly worry more about those that believe this rhetoric than the "victims" of misinformation.

Also, it's curious how those that so easily accept censorship never think that they will eventually be on the wrong side of the taser gun.

> I agree with you in theory. But in practice this is impossible. The human brain is physically unable to work everything through from first principles.

Good thing then that this is NOT WHAT I AM SAYING.

There is no need to "work though things from first principles". The idea is NOT to determine a priori what is "right" or "safe" and then make a binary decision. The base idea is to decide on what action to take (or to refuse to take) by asking yourself what is the worst possible thing that can happen if the information I have is wrong? What are the odds of me being wrong?.

I'd suggest you get acquainted with Nassim Taleb and Joe Norman to understand better how to deal with complexity and uncertainty.

> In practice they have turned people against each other with very real and serious consequences.

Bullshit. There was no Facebook during the time of the Crusades. There was no Twitter during the Cold War and no smartphones during WW1 and WW2. None of these things would be avoidable if only we could censor wrongthink.

On the other hand, THERE ARE video records of Tienanmen Square who have been successfully hidden from an entire country for an entire generation.

(Sorry for the harsh language, but I start reading any kind of censorship-apologetic and fighting instincts kick in. If you don't see how much of a sign of being morally bankrupt it is to casually defend the hellish things like state-sponsored censorship, I see no point in continuing the "debate")

wussboy · on Sept 12, 2021

Hey, no worries, rlgullis. I get heated too. :) I don’t know if it will be fruitful to continue this discussion here either, but I appreciate your comments and I suspect that if we spent an afternoon trying we’d find our common ground was vast. Have a good one!

rglullis · on Sept 13, 2021

There are some things that - no matter how much "common ground" we have - simply can not be discussed in relative terms. Advocating that we all should be subjected to censorship and silence anyone who speaks against the status quo is one of them.

To think that is okay to have one all-too-powerful entity controlling information channels is stepping into fascism and totalitarianism. This is a lesson that we should have learned already: no possible good comes out of that.

tommymachine · on Sept 11, 2021

[flagged]

otterley · on Sept 11, 2021

Publishing provably false statements or reports with intent to deceive (or even just gross negligence) falls pretty squarely under the definition of misinformation. This isn’t very controversial, except among nut cases.

tommymachine · on Sept 11, 2021

Oh. I see. So when a computer system sends signals over a wire, if it represents "provably false statements", it's actually misinformation, and not information. All those 1s and 0s instantly switch from information to misinformation, the minute their final representative form embodies a "provably false statement".

Who decides what's provably false, by the way?

Are the novels of Tolkien "misinformation", since it could presumably be easily proved that the events described in them didn't actually happen?

And what is the burden of proof for "intent to deceive"? And in which court is this all decided?

Who decides? Just people in your group right? What ever your group happens to be. Sure hope we all worship your god then, because the other gods are all "Misinformation".

bashinator · on Sept 11, 2021

Censorship being “worse than misinformation” seems like whataboutism, given that FB has serious problems with both.

HideousKojima · on Sept 11, 2021

It's not a whataboutism, it's pointing out that with one of the most popular proposed solutions to misinformation, censorship, the cure is worse than the disease.

ineedasername · on Sept 11, 2021

I like the quote the from the Supreme court on the topic:

"If there be time to expose through discussion, the falsehoods and fallacies, to avert the evil by the processes of education, the remedy to be applied is more speech, not enforced silence"

However, I think it's also important to recognize that in today's algorithmically driven content presentation, "more speech" is often comically ineffective because it is never consumed in the emergent content bubbles that silo people from contradictory information. Not to mention the fact that misinformation that confirms your preconceptions is a much more powerful influence than actual information that contradicts them. Given this, an important caveat embedded in the above quote is: "If there be time". A recognition of the fact that, in some circumstances, there will not be an opportunity for more speech to prevail.

I don't have a solution to this. There may be no good solution to this, except lesser degrees of bad solutions.

bashinator · on Sept 12, 2021

The degree to which social media allows both the amplification, and as you said, siloing of viewpoints to a mass audience, is IMO qualitatively different from the media available pre-Internet.

justbored123 · on Sept 11, 2021

Buddy Buddy WAKE UP, Facebook is not a public service, it's not there to serve a nobody like you, it's a private company that exist to make money. You can't use their service to trash them and promote their competition, that is not a reasonable business model for them. To finance you by giving you a free platform to promote their competition and trash them is not a good trade for them. DO YOU UNDERSTAND???

inkblotuniverse · on Sept 11, 2021

If it serves the public, it's a public service. Utility companies were private companies until people decided they weren't.

ribosometronome · on Sept 11, 2021

What companies would not be public services under that definition?

inkblotuniverse · on Sept 13, 2021

Ones that aren't natural or in-practice monopolies