Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The purpose of science publications is to share new results with other scientists, so others can build on or verify the correctness of the work. There has always been an element of “receiving credit” to this, but the communication aspect is what actually matters from the perspective of maximizing scientific progress.

In the distant past, publication was an informal process that mostly involved mailing around letters, or for a major result, self-publishing a book. Eventually publishers began to devise formal journals for this purpose, and some of those journals began to receive more submissions than it was feasible to publish or verify just by reputation. Some of the more popular journals hit upon the idea of applying basic editorial standards to reject badly-written papers and obvious spam. Since the journal editors weren’t experts in all fields of science, they asked for volunteers to help with this process. That’s what peer review is.

Eventually bureaucrats (inside and largely outside of the scientific community) demanded a technique for measuring the productivity of a scientist, so they could allocate budgets or promotions. They hit on the idea of using publications in a few prestigious journals as a metric, which turned a useful process (sharing results with other scientists) into [from an outsider perspective] a process of receiving “academic points”, where the publication of a result appears to be the end-goal and not just an intermediate point in the validation of a result.

Still other outsiders, who misunderstand the entire process, are upset that intermediate results are sometimes incorrect. This confuses them, and they’re angry that the process sometimes assigns “points” to people who they perceive as undeserving. So instead of simply accepting that sharing results widely to maximize the chance of verification is the whole point of the publication process, or coming up with a better set of promotion metrics, they want to gum up the essential sharing process to make it much less efficient and reduce the fan-out degree and rate of publication. This whole mess seems like it could be handled a lot more intelligently.



Very well put. This is the clearest way of looking at it in my view.

I’ll pile on to say that you also have the variable of how the non-scientist public gleans information from the academics. Academia used to be a more insular cadre of people seeking knowledge for its own sake, so this was less relevant. What’s new here is that our society has fixated on the idea that matters of state and administration should be significantly guided by the results and opinions of academia. Our enthusiasm for science-guided policy is a triple whammy, because 1. Knowing that the results of your study have the potential to affect policy creates incentives that may change how the underlying science is performed 2. Knowing that results of academia have outside influence may change WHICH science is performed, and draw in less-than-impartial actors to perform it 3. The outsized potential impact invites the uninformed public to peer into the world of academia and draw half-baked conclusions from results that are still preliminary or unreplicated. Relatively narrow or specious studies can gain a lot of undue traction if their conclusions appear, to the untrained eye, to provide a good bat to hit your opponent with.


A significant problem we face today is the way research, especially in academia, gets spotlighted in the media. They often hyper-focus on single studies, which can give a skewed representation of scientific progress.

The reality is that science isn't about isolated findings; it's a cumulative effort. One paper might suggest a conclusion, but it's the collective weight of multiple studies that provides a more rounded understanding. Media's tendency to cherry-pick results often distorts this nuanced process.

It's also worth noting the trend of prioritizing certain studies, like large RCTs or systematic reviews, while overlooking smaller ones, especially pilot studies. Pilot studies are foundational—they often act as the preliminary research needed before larger studies can even be considered or funded. By sidelining or dismissing these smaller, exploratory studies, we risk undermining the very foundation that bigger, more definitive research efforts are built on. If we consistently ignore or undervalue pilot studies, the bigger and often more impactful studies may never even see the light of day.


Most of this is very legit, but this

> Still other outsiders, who misunderstand the entire process, are upset that intermediate results are sometimes incorrect. This confuses them, and they’re angry that the process sometimes assigns “points” to people who they perceive as undeserving. So instead of simply accepting that sharing results widely to maximize the chance of verification is the whole point of the publication process, or coming up with a better set of promotion metrics, they want to gum up the essential sharing process to make it much less efficient and reduce the fan-out degree and rate of publication.

Does not represent my experience in the academy at all. There is a ton of gamesmanship in publishing. That is ultimately the yardstick academics are measured against, whether we like it or not. No one misunderstands that IMO, the issue is that it's a poor incentive. I think creating a new class of publication, one that requires replication, could be workable in some fields (e.g. optics/photonics), but probably is totally impossible in others (e.g. experimental particle physics).

For purely intellectual fields like mathematics, theoretical physics, philosophy, you probably don't need this at all. Then there are 'in the middle fields' like machine learning which in theory would be easy to replicate, but also would be prohibitively expensive for, e.g. baseline training of LLMs.


And on the extreme end you have the multi-decade longitudinal studies in epidemiology / biomedicine that would be more-or-less impossible to replicate.


I remember reading that some epidemiologists saw the wealth of new data from CoVID as a silver lining because of how few events there are at that scale. Apparently it’s not uncommon to still use the Spanish Flu data which is spotty at best because it might be the only thing available at the scale you’re interested in


IMHO, physicist, especially theoretical physicist, must be able to create a physical model of something, to confirm that their mathematical models are somewhat connected to reality. WTF is «wave of probability»? WTF is «bending of space-time»? These things are possible in dream-land physics only.


For sharing results widely, there's arxiv. The problem is that the fanout is now overwhelming.

The public perception of a publication in a prestigious journal as the established truth does not help, too.


> The public perception of a publication in a prestigious journal as the established truth does not help, too.

it's not so much the public perception but what govs/media/tech and other institutions have pushed down so that the public doesn't question whatever resulting policy they're trying to put forth.

"Trust the science" means "Thou shalt not question us, simply obey".

Anyone with eyes who has worked in institutions knows that bureocracy, careerism and corruption are intrinsic to them.


Your analysis seems to portray all scientists as pure hearted. May I remind you of the latest Stanford scandal where the president of Stanford was found to have manipulated data?

Today, publications do not serve the same purpose as they did before the internet. It is trivial today to write a convincing paper without research and getting that published(www.theatlantic.com/ideas/archive/2018/10/new-sokal-hoax/572212/&sa=U&ved=2ahUKEwjnp5mRtsiAAxVwF1kFHesBDC8QFnoECAkQAg&usg=AOvVaw0t_Bo31BrT5D9zHBdmNAqi).


No subset of humanity is “pure hearted.” Fraud and malice will exist in everything people do. Fortunately these fraudulent incidents seem relatively rare, when one compares the number of reported incidents to the number of publications and scientists. But this doesn’t change anything. The benefit of scientific publication is to make it easier to detect and verify incorrect results, which is exactly what happened in this case.

I understand that it’s frustrating it didn’t happen instantly. And I also understand that it’s deeply frustrating that some undeserving person accumulated status points with non-scientists based on fraud, and that let them take a high-status position outside of their field. (I think maybe you should assign some blame to the Stanford Trustees for this, but that’s up to you.) None of this means we’d be better off making publication more difficult: it means the metrics are bad.

PS When a TFA raises something like “the replication crisis” and then entangles it with accusations of deliberate fraud (high profile but exceedingly rare) it’s like trying to have a serious conversation about automobile accidents, but spending half the conversation on a handful of rare incidents of intentional vehicular homicide. You’re not going to get useful solutions out of this conversation, because it’s (perhaps deliberately) misunderstanding the impact and causes of the problem.


For your analogy on car accidents - a notable difference between both is that in the case of car accidents, we are able to get numbers on when, how and why they happen and then make conclusions from that.

In this case, we are not even aware of most events of fraud/"bad papers"/manipulation - the "crisis" is that we are losing faith in the science we are doing - results that were cornerstones of entire fields are found to be nonreproducible, making all the work built on top of it pointless.(psychology, cancer, economics, etc - I'm being very broad)

At this point, we don't know how deep the rot goes. We are at the point of recognizing that it's real, and looking for solutions. For car accidents, we're past that - we're just arguing about what are the best solutions. For the replication crisis, we're trying to find a way forward.

Like that scene in The Thing, where they test the blood? We're at the point where we don't know who to trust.

Ps: what's a tfa?


Fraud isn't exceedingly rare :( It only seems that way because academia doesn't pay anyone to find it, reacts to volunteer reports by ignoring it, and the media generally isn't interested.

Fraud is so frequent and easy to find that there are volunteers who in their spare time manage to routinely uncover not just individual instances of fraud but entire companies whose sole purpose is to generate and sell fake papers on an industrial scale.

https://www.nature.com/articles/d41586-023-01780-w

Fraud is so easy and common that there are a steady stream of journals which publish entire editions consisting of nothing but AI generated articles!

https://www.nature.com/articles/d41586-021-03035-y

Despite being written as a joke over a decade ago, you can page through an endless stream of papers that were generated by SciGen - a Perl script - and yet they are getting published:

https://pubpeer.com/search?q=scigen

The problem is so prevalent that some people created the Problematic Paper Screener, a tool that automatically locates articles that contain text indicative of auto-generation.

https://dbrech.irit.fr/pls/apex/f?p=9999:1::::::

This is all pre-ChatGPT, and is just the researchers who can't be bothered writing a paper at all. The more serious problem is all the human written fraudulent papers with bad data and bad methodologies that are never detected, or only detected by randos with blogs or Twitter accounts that you never hear around.


The wonderful thing about the western world is that most countries value freedom of the press. The dark side of this is that you can spin up your own “scientific journal” and charge people to publish in it, game the rankings like any common SEO scam, and nobody will stop you because (especially here in the US) you’re exercising your first amendment rights. Then people can fill it with nonsense and even script-generated fake papers. People outside the scientific community can also scam more “legitimate” for-profit journals in various ways, resulting in more silly publications that the actual scientific community has to filter out. It’s very annoying.

None of this has any more bearing on fraud by professional scientists than, say, the existence of some garbage-filled Wikimedia server or a badly-edited Wikipedia page means that the Wikipedia editors themselves are fraudsters.


With respect, I think you should research the topic more deeply before assuming that this is some sort of fringe problem that doesn't exist in the "actual" scientific community. The second link I provided is by Nature News and states specifically that the problem affects "prestigious journals" (their words).

Auto-generated papers have been published in journals from the IEEE, Elsevier, Springer Nature and other well known publishing houses. These papers have supposedly passed peer review in western journals that have been around for decades, and have been signed off by professional academics. Invariably no satisfactory explanation for how this happens is provided, with "we got hacked" being a remarkably common claim. Quite how you publish an entire magazine full of fraudulent articles due to one person getting hacked is unclear; actual newspapers and magazines don't ever have this problem.

Here's an example. The Springer Nature journal "Personal and Ubiquitous Computing" was established in 1997 and has its own Wikipedia page:

https://en.wikipedia.org/wiki/Personal_and_Ubiquitous_Comput...

The Editor-In-Chief is a British academic, who also has his own Wikipedia page. So these aren't fly-by-night no-brand nobodies. Yet this journal somehow managed to publish dozens of obviously auto-generated papers, like this one:

https://static-content.springer.com/esm/art%3A10.1007%2Fs007...

"The conversion of traditional arts and crafts to modern art design under the background of 5G mobile communication network"

or

https://static-content.springer.com/esm/art%3A10.1007%2Fs007...

"The application of twin network target tracking and support tensor machine in the evaluation of orienteering teaching"

The papers are just template paragraphs from totally unrelated topics spliced together. Nobody noticed this had happened until months after publication, strongly implying that this journal has no readers at all (this is a common theme in all these stories, they never seem to notice themselves). The editor agreed the papers were nonsense (his words), but blamed peer reviewers. Yet this journal claims to have a large editorial board with over 40 people on it, mostly from universities in the Europe, USA and China.

What's amazing is that this exact same "attack" had happened before. The previous year Springer Nature had to retract over 400 papers which were auto-generated in the exact same way. They learned nothing and appear to treat the problem as a similar level of severity to filtering email spam.

And in the last six months alone we've seen major fraud scandals impacting Stanford (the President no less), Harvard and Yale. These are supposedly elite universities and researchers. Francesca Gino was earning over $1M a year. Yet their fraud is being uncovered by motivated volunteers, not any kind of systematic well funded science police.

So all the signs here point towards fraud being incredibly easy to get away with. Whole journals have literally no readers at all, academia relies on Scooby-Doo levels of policing, and supposedly prestigious brands are constantly having fraud uncovered by random tweeters, undergrads doing journalism as a hobby etc.


As far as research is concerned, the names you mentioned are not prestigious brands. Universities are organizations that provide facilities and services in exchange for their share of grant money. Publishers publish whatever people are willing to pay for. Prominent mentions of supposedly prestigious institutions are a red flag in science reporting. Very often, either the writer is trying to promote something, or they don't know what they are talking about.

There are two parallel academias. There is the reputable high-trust one, where it's easy to get away with fraud, because people generally don't commit it. And there is the scammy one that exists to help people to game the metrics. While the two overlap a bit, they are mostly disjoint.

If you are an academic, you get a steady stream of spam from the scammy side of the academia. You get calls to submit papers to a conference with "Proceedings by Springer" (but the scope of the conference is barely mentioned), you get invited to become an "ΕԀitоrial ΜҽmƄҽr" of a journal, and so on. Those are like Nigerian letters. They make it very explicit that they are scams, in order to avoid wasting people's time.

You guessed that nobody reads the journal you mentioned, and that's trivially true. Of course nobody reads journals, because their scopes are too wide. No matter what you are working on, most articles in the journals you publish in are irrelevant to you. People read only articles that look interesting or relevant. If nobody cares about an article, it doesn't get read.

While the rest of the world is based on top-down hierarchies, that's not a good model for understanding research. In general, the higher up in the hierarchy you go, the less relevant things become. The article is more relevant than the journal, and the journal is more relevant than the publisher. A rank-and-file professor is more relevant than a department chair. A department chair is more relevant than a dean. And a dean is more relevant than a chancellor/president/whatever.


I hope you appreciate the futility of trying to prove that fraud is ruining science publication by linking to a bunch of publications that capably detect and point out all the fraud.


What makes you think they detect and point out all the fraud? These sites are hobby sites run by people who just run some very basic text filtering software. If fraud was being reliably detected paper mills wouldn't have businesses, yet these companies seem to be quite common and exist in multiple countries.

And recall that I said all this work pre-dates ChatGPT. Using LLMs to generate scientific papers works great, and you won't be able to find them using regexs.

The journals themselves admit there are serious fraud problems and that they don't know what to do about it. So it's very concerning. The world needs a trustworthy scientific literature.


Thanks you - just discovered Scigen, these links are incredible




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: