Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What should I do if I suspect one of the journal reviews I got is AI-generated? (academia.stackexchange.com)
140 points by j2kun on Nov 29, 2023 | hide | past | favorite | 59 comments


Having just completed a reviewer workshop with the top journal in my field, and having reviewed for multiple conferences, I have several points.

1. Don't underestimate how bad human reviewers can be. I've seen really bad reviews before. But the worst were for conferences, not journals.

2. The job of an associate editor is to field the reviews and make decision recommendations to the senior editor. A good associate editor will take care of this stuff, but may let a bad review through for the sake of the process. They might emphasize a particular review to help the author understand what the editor actually thinks is important, as opposed to letting the author think that all reviews are equal. That being said, it's up to the author to respond. If a reviewer is unequivocally wrong about something, the author can explain why they didn't follow the reviewer's recommendations. What the senior editor (and to some extent the associate editor) thinks is what matters, not what the reviewer thinks.

3. If the associate editor is not doing their job of fielding and reviewing the reviews, I question whether the journal is actually a top journal. My impression thus far is that top journals take their editing seriously. So far in grad school, I've met multiple editors from multiple journals, and gone to multiple journal workshops. The amount of work these people pour into doing journal work, many for free, is staggering. The burnout rate is significant accordingly, but the ones who stay keep it up because they want to be serious custodians of their discipline's research authority. It's a massive amount of work. I'm not sure I'd want to do it myself. I can't imagine such people brushing aside bad reviews and not realizing how bad they are. This is partly why good journals also have workshops to teach how to review. It's not easy to become an associate editor either. You need to become respected enough in the community to get nominated by editors and then voted in by editors. They have a standard based on how much they respect researchers.

Now... it's possible that my discipline (information systems) is unique in this manner. Is it possible that the top journals in computer science, physics, or other don't take this seriously? I doubt it?


Apparently the reviewer in this case couldn't even pass a Turing test. :/


Reaction:

Keep "AI" out of it. As described, the (suspected-AI) review seemed to only be based on the Abstract (didn't bother reading the rest of the submitted paper), and mentions several papers from irrelevant fields. Politely suggest to the editor that that reviewer was obviously struggling to review a paper well outside his area of expertise, and might best be replaced with a reviewer who is a better fit for the subject matter of your article.


In a lot of contexts, whether someone leaned on an LLM--lightly or heavily--is sort of irrelevant. The output is either good/reasonable or it's not. (Or some gradation between the two.) Any tools they used is beside the point.


Where previously it took 100 units of effort to deal with 10 units of effort of bullshit, it now takes 100 to deal with 1.

This is only irrelevant if you place no value on the time of yourself and others.


Different sort of relevancy. It is relevant that generating good looking bullshit became much faster and cheaper to the overall process because it means that assessing someone's contributions is more difficult, but it is not relevant when you're giving feedback on the hypothetically critical feedback you received in the first place. All that matters there is whether it was useful or not.

Someone submitting AI generated reviews becomes relevant again when deciding whether to keep a reviewer around - a pattern of useful looking but useless and time wasting 'contributions' is relevant.

Basically don't over index on whether someone is using AI to be a shitter, focus on the problematic behavior.


It’s a shame it’s now so much easier for bullshitters to produce bullshit quickly, but it is still irrelevant to whether a given piece of work is good or not.


It's relevant because AI allows you to work faster and in larger volumes, and pushes quality down in the process. Because the user will optimize for quantity, not quality (which wouldn't be a viable choice in the absence of AI)


And even if further advances in ML can improve the "writing quality", overall quality is a much more multidimensional thing, and being able to produce a convincing-sounding review (formatted correctly, talks about actual content within the review, etc) is not the same as giving a useful one. As another comment in this subthread noted, if the author feels that an LLM-generated review is worthwhile, they can feed it to one themselves—and it's entirely possible that a specially-trained LLM could give some halfway decent reviews of some basic things like spelling and grammar, missing information or sections, that sort of thing, simply based on previous article drafts and their reviews.

We should not be predicating our concerns about LLM-generated content solely on its "quality", because ultimately, the problem with it is that it is generic. I think it unlikely that it will have the ability to produce a genuine and thoughtful critique of a journal article until and unless there are significant breakthroughs, possibly even to the level of achieving AGI or something like it. Even using a more-advanced review-specific LLM like I describe above more widely does present serious concerns, because it runs the risk of suppressing articles that deviate from the "norm" in ways that the LLM doesn't have any way to appreciate, but which can present the findings better or even make the science better.


If you don't think a lot of people don't already optimize for quantity, I have a bridge to sell you.

I do get the point that LLMs make producing crap easier but that's somewhat independent of LLMs being used generally--which is going to happen in any case.


Honestly, I don't think it's viable to manually optimize for quantity in academic paper reviewing, specifically. But I might be wrong, of course. I think it's too much work for very little profit.


I was speaking more generally. I'm not really in the "biz" but not sure what the incentives are to do a crap job of academic paper reviewing at scale.


Basically, the only incentives are to slightly improve your resume by showing you are a reviewer for reputable journals, and to get fee waivers for publishing your own work in said journal (in crappy journals, usually). But you if you review a lot, you may get selected to be an editor and then climb the ladder from there, to be editor in a better journal, or become editor-in-chief, all of which can be prestigious (and paid) positions.


You seem to be arguing against some other point I have not made. I said it is irrelevant to whether a given piece of work is good or not. If a given piece of writing is good, it’s good regardless of what tools the writer used.


If however a given piece of work is good but is produced with a tool that drives down the average quality of the field it may be reasonable to ask that people not use the tool, even if they produce good work with it


This is an instance of the "99% accurate test says you have an incredibly rare disease" fallacy.

As the percentage of garbage that goes into peer review (or any other filter) increases, the percentage of garbage that manages to sneak through will increase.


We just need some sort of AI which can help us filter through all the bullshit


For me, it adds a shadow of doubt on the reliability of a peer reviewed journal if one of the "peers" is an LLM during the "it doesn't even know it's lying" AI stage we are currently in.

I read a review of The Singularity is Near by Ray Kurzweil where it was described as seeing a table full of what appears to be very delicious food, but it is then revealed that there is absolutely some amount of dog feces mixed in with some of the dishes. You can't tell which is safe, and which is carefully crafted with dog feces.

An LLM in a peer reviewed journal currently has no place, unless it is part of an experiment where it is trained on the Journal's body of work and then tested for accuracy with future articles. As the tech progresses it may find a place but if it takes twice as long to fact check the LLM output it's saving nobody time and possibly hallucinating in hard to catch ways.


It's even worse that LLMs don't have a concept of lying: they are asked to generate text and they generate it, even if it means making things up or 'lying' as we generally call it. The real issue is the human people treating them as oracular fonts already.


They're as trustworthy as any other oracle then


Or as trustworthy as any other human. Humans often believe things that are objectively not true and yet they will use these things to judge other things by.


That's not a good analogy. Trusting an LLM right now is more akin to trusting a compulsive liar. (Another approximation would be assuming historical fiction was true.)

Humans usually try not to lie, and there's a particular shape to the sorts of details they tend to forget / confuse. Compulsive liars often don't even notice themselves lying. That's closer to an LLM. I trust the output of an LLM about as much as stuff George Santos says.


We seek for our gods in lowly places it seems.


Agreed. LLMs are one of the places where "the math checks out" but that doesn't mean it describes anything helpful, useable, or even correct at minimum.


https://www.cnn.com/2019/10/04/health/insect-rodent-filth-in...

I get what you're saying, but this is exactly the point of peer review though. It wouldn't be worse if the original author was doing shoddy work in some parts.


I understand the point you're making, philosophically, but the pragmatist in me says that this practice needs to be discouraged (though an outright ban is probably unenforceable).

If you give busy reviewers an easy "out", where they can just run the paper through an LLM, do a bit of editing then send off the review, people are going to do exactly that.

And the resulting review, with the right editing, might seem perfectly plausible and human-like. But that review isn't going to be able to offer suggestions with insight from recently published papers. It isn't going to be able to point out issues with the data, or with the statistical analysis, or with the paper's logical conclusions.

Maybe someday AI will be capable enough to replace the role of human reviewers. But right now, encouraging this practice is just going to let a lot of bad science slip through to publication without genuine peer review. (even more than the large amount that already does, let's be honest ...)


I think there are two issues bundled into your reply, and they're best addressed separately.

The small issue is whether people are responsible for what they publish under their own name. Seems like a straightforward "yes", and whatever helper tools they use are irrelevant.

The much bigger issue is why scientific publishing's standard for a review is only "plausible and human-like", allowing people to to submit a LLM generated summary of an abstract without fear of responsibility.


> If you give busy reviewers an easy "out" ...

Reviewers looking for an "easy way out" might be inevitable if they continue to remain uncompensated and uncredited for their time and efforts while journals get all the profits for other peoples' research.


The purpose of a reviewer is to provide the reviewer's feedback on a paper. If the editor wants to get feedback from an LLM, they are perfectly capable of doing so themselves. There is an attribution chain here that may not be terribly relevant in the short term but in the long term is a big deal.

Historically, we speak of "plagiarism" as being something you do against human text, because human text is all there was. But I would suggest that most of the issues with plagiarism are actually around misattribution, which means that it is perfectly sensible to speak of "plagiarizing" an AI. The AI may not be victimized, but victimization is not the only issue with plagiarism and most or all of the rest of them apply here. It matters over time where the text comes from. Even if the text of the review is high quality, in order to tune the editor's own tracking of reputation they need to know if it is from a human reviewer, GPT-1, GPT-7.5, or NotGPTAtAllSciAI-2026.

This is especially true in this case, because the entire point of a reviewer's review is that they are doing something the editor is not supposed to be doing! If the editor has to do a deep due diligence on all reviews, the reviewer are failing to provide any value as the editor might as well directly review the paper in question. So reputation is not something we can just wave away with "well if it was a good review it doesn't matter"; trust is a huge deal here. The editors need reviews to be properly attributed. Even if they are fine with AI reviews they need to know they are from AIs, and as I said, which AIs.


Do you also object if they do Google searches?

The reviewer certainly shouldn't depend on the results in either case. But it certainly seems reasonable to refer to references that aren't solely in their head.


That clearly has nothing to do with what I said, and the accusation is not that the reviewer "referred" to an AI.

And for the same reasons I laid out, if a reviewer thinks they need to "refer" to an AI, they should just tell the editor they're not qualified. The editor does not need a reviewer to serve as a middleman between them and a GPT. The reviewer is adding no value at that point; the editor can already "refer" to an AI themselves if they want to.


Knowing which Google search to do is valuable expertise that goes beyond copying a submission into ChatGPT.


>In a lot of contexts, whether someone leaned on an LLM--lightly or heavily--is sort of irrelevant

This isn't one of those contexts. it's called peer review for a reason. You don't get to outsource your duty to either a machine or some random person. It's explicitly you others have vested their trust in.

>The output is either good/reasonable or it's not.

In the world of human beings this isn't the only thing that matters. Reminds me of Zizek who pointed out the end result of the "AI revolution" isn't going to be machines acting like humans, but the reverse, humans LARPing as machines. Humans as obtuse as robots, rather than the other way around.


Journals and academia are starting to reward reviewing papers (you can mention your reviews on your resume), so I don't think it's irrelevant. The supposed AI reviewer here is probably polluting journals with dozens of poor quality reviews. This wouldn't be possible without AI help, so that makes it a big problem!


Obviously being LLM generated is a good data point because it shows that the OP isn't arguing against the statements of the review itself.

It's also good for the editor to know about. LLMs represent a new acute threat to review quality that they may currently be underestimating. I've literally heard of people bragging about using ChatGPT instead of doing reviews themselves. People who aren't LLM experts don't necessarily understand their limitations or that using them in this way should be unacceptable. The editors should know so they can improve the communication of review expectations.


That's probably a good keeping-your-head-low strategy, but I can't help but feel this doesn't treat it as being severe as it is. I don't want academia to overreact about LLMs (they have done this enough already, with the huge number of academic cheating accusations) but AI output that is entirely unchecked doesn't belong in the scientific peer review process.

Those using AI tools in such situations should be expected to remove anything from the LLM's output that they can't verify with their own expertise. Reviewing out of your expertise doesn't necessarily inevitably lead to mistakes, but unchecked AI output will.


It's about informing the only person who can get extra information, decide if it's severe and do something about it - the editor.

The author can't (and shouldn't) do anything directly about the anonymous reviewer, all the responsibility, authority and duty is up to the editor, who at least knows who that person is.


Of course. I would do pretty much the same thing (finishing the review process first before complaining), but I would mention the suspicion of AI usage. If that aspect is missed, then it's very much possible the investigation won't lead to the necessary improvement in the organization. It's worth speaking out about such a problem if you are comfortable enough in your career to risk it, because it represents something that's going to rot the entire journal and the scientific community as a whole.


Just out of curiosity, are you aware that some lawyers have started submitting LLM-generated content as legal drafts in (US) courts of law?

Last I heard, the one had been censured by the court, but courts generally have no power over law licenses. We might have to wait awhile to find out if there will be any more serious repercussions.

I think that in many professional settings, we might in the near future discover that some large fraction have been "faking it until they make it", but without the "making it" conclusion.

The fun part is when congressional staffers use this for gigantic 10,000 page bills too large for anyone to catch it before the vote. It might already be happening.


Agree that this is a potentially system-damaging problem that will only get worse if not directly dealt with. In this case, I think the advice in the OP is good however: address the feedback from the good review, resubmit, and once the paper is accepted, then contact the editor with concerns once it's clear that you aren't objecting to needing to revise.


Yes, and the point then is that it is unchecked. Would it be different if it had been farmed out to a student - and then unchecked?

Or for that matter if it had been dictated - and then unchecked?


That could rise to the level of serious infraction too, of course. But there is one key difference that makes it worse, which is that a student is not probabilistic like an LLM is. Given enough output, no matter how well behaved you think your prompt is, the LLM will output something completely off the rails. In fact, it will give every possible output.

Can happen to students too (sleep deprivation will do it), but not with the same inevitability as the LLM tool. To phrase another way, you could set up the 'human factor' in a way that you can trust unchecked output from another human (e.g. checking their expertise in academia, or if they are a commercial aircraft pilot, checking their pre-flight notes on how much sleep they got), but not for LLMs.


In theory, I mostly agree with you.

In practice - my advice is for the academic, who is trying to get an article published in "one of the well-reputable journals". That is a weak hand to be playing. Vs. the journal's editor is in a far stronger position, to hit back hard at whoever seems to be farming out their review job to a cut-rate bot.

Edit: 's/is farming/seems to be farming/'


but again, there is no evidence of AI, just incompetence and/or laziness.


One factor to consider is AI-Augmented content. I'm not an academic reviewer, but I certainly will do some sort of analysis, write up some key bullet points, then ask chatgpt to synthesis some prose for a report. I then make a few tweaks and edits and send it off. The core content is coming from my analysis, not generation, but if I'm being lazy the content ends up having the "default chatgpt style." I could imagine this ends up being common, especially for non-english natives.


I support this style of usage. The impression given from this example is that the content itself, not the style, is low quality.


I've noticed this style of usage greatly reduces the quality of work my colleagues produce. The computer is great at writing stuff that looks right, but is not.

Also, at least some of the "increased productivity" boils down to "I spent less time thinking about the underlying problem while I was composing text and copy-editing".


But what about the times where your colleagues use LLMs in this way, but the output is good enough that you don't realize it?


this reminds me the time I was 16 (which is almost 20 years ago), having an interest in communication theory, I somehow ended up on a IEEE journal review list on the topics. I received a paper to review from someone in china, and I bullshitted my way through that review thinking that was the start of my academic career.


Well don’t keep us in suspense…


It was the start of a multi-million dollar lifestyle business and the proximate cause of the reproducibility issues impeding scientific progress right now.


The answer (already in the post) is to contact the editor with your concerns.

That's what they are there for.


I would run it through originality.ai, but still take precaution on the result given.


Die mad about it.


I wonder how GPTZero scores it.


Unless they used GPTinf


Well, I don't really follow many YouTube channels, but I'm certainly following this one.


My takeaway: I need the person who wrote the top response on stackexchange to tell me how to run my life.


I asked Chat GPT what to do in such a case:

In this situation, it would be reasonable to raise your concerns with the journal editor. While you may not have definitive proof that the review was AI-generated, your observations and the results from the AI detection tools provide enough basis for a respectful inquiry. Expressing concerns about the review process is important for maintaining the integrity and quality of academic publishing. However, it's crucial to approach the matter diplomatically, focusing on seeking clarification rather than making accusations. Remember, the goal is to ensure constructive and relevant peer review for your paper, not to challenge the decision or the reviewer's credibility.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: