In the PL field, conferences have started to allow authors to submit packaged artifacts (typically, source code, input data, training data, etc) that are evaluated separately, typically post-review. The artifacts are evaluated by a separate committee, usually graduate students. As usual, everything is volunteer. Even with explicit instructions, it is hard enough to even get the same code to run in a different environment and give the same results. Would "replication" of a software technique require another team to reimplement something from scratch? That seems unworkable.
I can't even imagine how hard it would be to write instructions for another lab to successfully replicate an experiment at the forefront of physics or chemistry, or biology. Not just the specialized equipment, but we're talking about the frontiers of Science with people doing cutting-edge research.
I get the impression that suggestions like these are written by non-scientists who do not have experience with the peer review process of any discipline. Things just don't work like that.
Is PL theory actually science? Although we call it computer science, I don't personally think CS is actually a science in the sense of studying nature to understand it. Computers are artificial constructs. CS is a lot closer to engineering than science. Indeed it's kind of nonsensical to talk about replicating an experiment in programming language theory.
For the "hard" sciences, replication often isn't so difficult it seems. LK-99 being an interesting study in this, where people are apparently successfully replicating an experiment described in a rushed paper that is widely agreed to lack sufficient details. It's cutting edge science but replication still isn't a problem. Most science isn't the LHC.
The real problems with replication are found in the softer fields. There it's not just an issue of randomness or difficulty of doing the experiments. If that's all there was to it, no problem. In these fields it's common to find papers or entire fields where none of the work is replicable even in principle. As in, the people doing it don't think other people being able to replicate their work is even important at all, and they may go out of their way to stop people being able to replicate their work (most frequently by gathering data in non-replicable ways and then withholding it deliberately, but sometimes it's just due to the design of the study). The most obvious inference when you see this is that maybe they don't want replication attempts because they know their claims probably aren't true.
So even if peer reviewers or journals were just checking really basic things like, is this claim even replicable in principle, that would be a good start. You would still be left with a lot of papers that replicate fine but their conclusions are still wrong because their methodology is illogical, or papers that replicate because their findings are obvious. But there's so much low hanging fruit.
Well there are entire areas of CS research tangential to PLs (say, SAT/SMT, software verification, program synthesis) where the fundamental problems are known to be NP-complete/exponential/undecidable. So it is pretty hard to declare a new approach "superior" on purely theoretical grounds: you usually have to
run some benchmarks and see how your new approach compares to existing alternatives. And you want these benchmark to be replicable across different machines and platforms.
Yes performance involves experiments, but are they scientific experiments? I'm not sure it matters either way, it's a purely semantic debate. The issues that create replicability problems in CS are pretty different to the ones that create issues in other fields. My experience was that they're purely engineering problems rather than problems of incentives or non-replicable designs. If the issue was just that some papers occasionally don't replicate because the authors forgot a detail, they get queried and update the paper then nobody would care about this issue. It gets attention because that's sadly not what happens.
> I get the impression that suggestions like these are written by non-scientists who do not have experience with the peer review process of any discipline. Things just don't work like that.
Not to mention that the cutting edge in many sciences are perhaps two-three research groups of 5-30 individuals each in varying research institutions around the world.
Unfortunately, no. Dockerfiles aren't as portable as you think, and not architecture-independent. VMs are better, but even then, performance isn't portable either.
The last artifact I produced included builds of 3 web browsers from source--it was over 10GB. One doesn't just "build Chrome in a dockerfile".
I can't even imagine how hard it would be to write instructions for another lab to successfully replicate an experiment at the forefront of physics or chemistry, or biology. Not just the specialized equipment, but we're talking about the frontiers of Science with people doing cutting-edge research.
I get the impression that suggestions like these are written by non-scientists who do not have experience with the peer review process of any discipline. Things just don't work like that.