No subset of humanity is “pure hearted.” Fraud and malice will exist in everythi...

dmbche · on Aug 6, 2023

For your analogy on car accidents - a notable difference between both is that in the case of car accidents, we are able to get numbers on when, how and why they happen and then make conclusions from that.

In this case, we are not even aware of most events of fraud/"bad papers"/manipulation - the "crisis" is that we are losing faith in the science we are doing - results that were cornerstones of entire fields are found to be nonreproducible, making all the work built on top of it pointless.(psychology, cancer, economics, etc - I'm being very broad)

At this point, we don't know how deep the rot goes. We are at the point of recognizing that it's real, and looking for solutions. For car accidents, we're past that - we're just arguing about what are the best solutions. For the replication crisis, we're trying to find a way forward.

Like that scene in The Thing, where they test the blood? We're at the point where we don't know who to trust.

Ps: what's a tfa?

mike_hearn · on Aug 6, 2023

Fraud isn't exceedingly rare :( It only seems that way because academia doesn't pay anyone to find it, reacts to volunteer reports by ignoring it, and the media generally isn't interested.

Fraud is so frequent and easy to find that there are volunteers who in their spare time manage to routinely uncover not just individual instances of fraud but entire companies whose sole purpose is to generate and sell fake papers on an industrial scale.

https://www.nature.com/articles/d41586-023-01780-w

Fraud is so easy and common that there are a steady stream of journals which publish entire editions consisting of nothing but AI generated articles!

https://www.nature.com/articles/d41586-021-03035-y

Despite being written as a joke over a decade ago, you can page through an endless stream of papers that were generated by SciGen - a Perl script - and yet they are getting published:

https://pubpeer.com/search?q=scigen

The problem is so prevalent that some people created the Problematic Paper Screener, a tool that automatically locates articles that contain text indicative of auto-generation.

https://dbrech.irit.fr/pls/apex/f?p=9999:1::::::

This is all pre-ChatGPT, and is just the researchers who can't be bothered writing a paper at all. The more serious problem is all the human written fraudulent papers with bad data and bad methodologies that are never detected, or only detected by randos with blogs or Twitter accounts that you never hear around.

matthewdgreen · on Aug 6, 2023

The wonderful thing about the western world is that most countries value freedom of the press. The dark side of this is that you can spin up your own “scientific journal” and charge people to publish in it, game the rankings like any common SEO scam, and nobody will stop you because (especially here in the US) you’re exercising your first amendment rights. Then people can fill it with nonsense and even script-generated fake papers. People outside the scientific community can also scam more “legitimate” for-profit journals in various ways, resulting in more silly publications that the actual scientific community has to filter out. It’s very annoying.

None of this has any more bearing on fraud by professional scientists than, say, the existence of some garbage-filled Wikimedia server or a badly-edited Wikipedia page means that the Wikipedia editors themselves are fraudsters.

mike_hearn · on Aug 7, 2023

With respect, I think you should research the topic more deeply before assuming that this is some sort of fringe problem that doesn't exist in the "actual" scientific community. The second link I provided is by Nature News and states specifically that the problem affects "prestigious journals" (their words).

Auto-generated papers have been published in journals from the IEEE, Elsevier, Springer Nature and other well known publishing houses. These papers have supposedly passed peer review in western journals that have been around for decades, and have been signed off by professional academics. Invariably no satisfactory explanation for how this happens is provided, with "we got hacked" being a remarkably common claim. Quite how you publish an entire magazine full of fraudulent articles due to one person getting hacked is unclear; actual newspapers and magazines don't ever have this problem.

Here's an example. The Springer Nature journal "Personal and Ubiquitous Computing" was established in 1997 and has its own Wikipedia page:

https://en.wikipedia.org/wiki/Personal_and_Ubiquitous_Comput...

The Editor-In-Chief is a British academic, who also has his own Wikipedia page. So these aren't fly-by-night no-brand nobodies. Yet this journal somehow managed to publish dozens of obviously auto-generated papers, like this one:

https://static-content.springer.com/esm/art%3A10.1007%2Fs007...

"The conversion of traditional arts and crafts to modern art design under the background of 5G mobile communication network"

or

https://static-content.springer.com/esm/art%3A10.1007%2Fs007...

"The application of twin network target tracking and support tensor machine in the evaluation of orienteering teaching"

The papers are just template paragraphs from totally unrelated topics spliced together. Nobody noticed this had happened until months after publication, strongly implying that this journal has no readers at all (this is a common theme in all these stories, they never seem to notice themselves). The editor agreed the papers were nonsense (his words), but blamed peer reviewers. Yet this journal claims to have a large editorial board with over 40 people on it, mostly from universities in the Europe, USA and China.

What's amazing is that this exact same "attack" had happened before. The previous year Springer Nature had to retract over 400 papers which were auto-generated in the exact same way. They learned nothing and appear to treat the problem as a similar level of severity to filtering email spam.

And in the last six months alone we've seen major fraud scandals impacting Stanford (the President no less), Harvard and Yale. These are supposedly elite universities and researchers. Francesca Gino was earning over $1M a year. Yet their fraud is being uncovered by motivated volunteers, not any kind of systematic well funded science police.

So all the signs here point towards fraud being incredibly easy to get away with. Whole journals have literally no readers at all, academia relies on Scooby-Doo levels of policing, and supposedly prestigious brands are constantly having fraud uncovered by random tweeters, undergrads doing journalism as a hobby etc.

jltsiren · on Aug 7, 2023

As far as research is concerned, the names you mentioned are not prestigious brands. Universities are organizations that provide facilities and services in exchange for their share of grant money. Publishers publish whatever people are willing to pay for. Prominent mentions of supposedly prestigious institutions are a red flag in science reporting. Very often, either the writer is trying to promote something, or they don't know what they are talking about.

There are two parallel academias. There is the reputable high-trust one, where it's easy to get away with fraud, because people generally don't commit it. And there is the scammy one that exists to help people to game the metrics. While the two overlap a bit, they are mostly disjoint.

If you are an academic, you get a steady stream of spam from the scammy side of the academia. You get calls to submit papers to a conference with "Proceedings by Springer" (but the scope of the conference is barely mentioned), you get invited to become an "ΕԀitоrial ΜҽmƄҽr" of a journal, and so on. Those are like Nigerian letters. They make it very explicit that they are scams, in order to avoid wasting people's time.

You guessed that nobody reads the journal you mentioned, and that's trivially true. Of course nobody reads journals, because their scopes are too wide. No matter what you are working on, most articles in the journals you publish in are irrelevant to you. People read only articles that look interesting or relevant. If nobody cares about an article, it doesn't get read.

While the rest of the world is based on top-down hierarchies, that's not a good model for understanding research. In general, the higher up in the hierarchy you go, the less relevant things become. The article is more relevant than the journal, and the journal is more relevant than the publisher. A rank-and-file professor is more relevant than a department chair. A department chair is more relevant than a dean. And a dean is more relevant than a chancellor/president/whatever.

snowwrestler · on Aug 7, 2023

I hope you appreciate the futility of trying to prove that fraud is ruining science publication by linking to a bunch of publications that capably detect and point out all the fraud.

mike_hearn · on Aug 7, 2023

What makes you think they detect and point out all the fraud? These sites are hobby sites run by people who just run some very basic text filtering software. If fraud was being reliably detected paper mills wouldn't have businesses, yet these companies seem to be quite common and exist in multiple countries.

And recall that I said all this work pre-dates ChatGPT. Using LLMs to generate scientific papers works great, and you won't be able to find them using regexs.

The journals themselves admit there are serious fraud problems and that they don't know what to do about it. So it's very concerning. The world needs a trustworthy scientific literature.

dmbche · on Aug 6, 2023

Thanks you - just discovered Scigen, these links are incredible