Hacker News new | past | comments | ask | show | jobs | submit login

Those are the causes that were picked up on initially, because they're the problems you can detect just by reading a paper without getting into the details. They're not the only causes though, just some of the most visible.

An incomplete list of other causes might contain:

• Wrong data or maths. A remarkably large number of papers (e.g. in psychology) contain statistical aggregates that are mathematically impossible given the study design, like means which can't be calculated from any allowable combination of inputs. Some contain figures that are mathematically possible but in reality totally implausible [1]

• Fraud. I've seen estimates that maybe 20% of all published clinical trials haven't actually been done at all. Anyone who tried to figure out the truth about COVID+ivermectin got a taste of this because a staggering quantity of studies turned out to exhibit disturbing signs of trial fraud. Researchers will happily include obvious Photoshops in their papers and journals will do their best to ignore reports about it [2]

• Bugs. Code doesn't get peer reviewed, sometimes not released either. The famous Report 9 Imperial College London COVID model was in development since 2004 but was riddled with severe bugs like buffer overflows, race conditions, and even a typo in the constants for their hand-rolled PRNG [3]. As a consequence their model produced very different numbers every time you ran it, despite passing fixed PRNG seeds in on the command line. The authors didn't care about this because they'd convinced themselves it wasn't actually a problem (and if you're about to argue with me on this, please don't, if I have to listen to one more academic explaining that scientists don't have to write deterministic "codes" I'll probably puke).

• Pretend data sharing. Twitter bot papers have perfected the art of releasing non-replicable data analysis, because they select a bunch of tweets on topics Twitter is likely to ban, label them "misinformation" and then in their publicly shared data include only the tweet ID, not the content. Anyone attempting to double check their data will discover that almost all the tweets are no longer available, so their classifications can't be disputed. They get to claim their analysis is replicable because they shared their data even though it's not.

• Methods that are described too vaguely to ever replicate.

And so on and so forth. There are an unlimited number of ways to make a paper that looks superficially scientific, but doesn't actually tell us something concrete that can survive being double checked.

[1] e.g. https://hackernoon.com/introducing-sprite-and-the-case-of-th...

[2] https://blog.plan99.net/fake-science-part-i-7e9764571422

[3] https://dailysceptic.org/archive/second-analysis-of-ferguson...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: