I mean, what you describe is an unfortunate and unavoidable issue in academia (a...

I mean, what you describe is an unfortunate and unavoidable issue in academia (and in the world in general). GPT4 doesn't work magic here, of course.

You still have to: 1) Understand the work and the motivation. (GPT4 can help by playing the role of junior PhD if you can play the role of astute advisor.) 2) Sniff out things that are underspecified or seem wrong. (GPT4 also can help here, see above.) 3) Email the authors with questions, compare against shitty published codebases, etc.

depending upon how gnarly/rushed the prose is.

With that said, it's also "research smell" (compare "code smell") if a paper is so hastily written and undercited that you're the first person replicating it. And maybe instead of going for "my new bleeding edge approach that got 0.1% score better than boring old model with 50 cites", you probably should just implement boring old model.

So, where this has been successful for me is in implementing denoising diffusion for different problem domains. Given that there is broad literature on denoising diffusion, when some things are underspecified you can start looking at best practices for other researchers.

Alternately, other things like specific transformers etc.

Basically what I'm saying is that if you are trying to reimplement something that is so niche, it's like catching butterflies. A better research agenda involves working within a particular field of study where there is supporting evidence and approaches to compare and constrast to. This goes without saying, regardless of whether an AI is involved or not.