In the PL field, conferences have started to allow authors to submit packaged ar...

mike_hearn · on Aug 6, 2023

Is PL theory actually science? Although we call it computer science, I don't personally think CS is actually a science in the sense of studying nature to understand it. Computers are artificial constructs. CS is a lot closer to engineering than science. Indeed it's kind of nonsensical to talk about replicating an experiment in programming language theory.

For the "hard" sciences, replication often isn't so difficult it seems. LK-99 being an interesting study in this, where people are apparently successfully replicating an experiment described in a rushed paper that is widely agreed to lack sufficient details. It's cutting edge science but replication still isn't a problem. Most science isn't the LHC.

The real problems with replication are found in the softer fields. There it's not just an issue of randomness or difficulty of doing the experiments. If that's all there was to it, no problem. In these fields it's common to find papers or entire fields where none of the work is replicable even in principle. As in, the people doing it don't think other people being able to replicate their work is even important at all, and they may go out of their way to stop people being able to replicate their work (most frequently by gathering data in non-replicable ways and then withholding it deliberately, but sometimes it's just due to the design of the study). The most obvious inference when you see this is that maybe they don't want replication attempts because they know their claims probably aren't true.

So even if peer reviewers or journals were just checking really basic things like, is this claim even replicable in principle, that would be a good start. You would still be left with a lot of papers that replicate fine but their conclusions are still wrong because their methodology is illogical, or papers that replicate because their findings are obvious. But there's so much low hanging fruit.

lou1306 · on Aug 7, 2023

Well there are entire areas of CS research tangential to PLs (say, SAT/SMT, software verification, program synthesis) where the fundamental problems are known to be NP-complete/exponential/undecidable. So it is pretty hard to declare a new approach "superior" on purely theoretical grounds: you usually have to run some benchmarks and see how your new approach compares to existing alternatives. And you want these benchmark to be replicable across different machines and platforms.

mike_hearn · on Aug 7, 2023

Yes performance involves experiments, but are they scientific experiments? I'm not sure it matters either way, it's a purely semantic debate. The issues that create replicability problems in CS are pretty different to the ones that create issues in other fields. My experience was that they're purely engineering problems rather than problems of incentives or non-replicable designs. If the issue was just that some papers occasionally don't replicate because the authors forgot a detail, they get queried and update the paper then nobody would care about this issue. It gets attention because that's sadly not what happens.

Maxion · on Aug 6, 2023

> I get the impression that suggestions like these are written by non-scientists who do not have experience with the peer review process of any discipline. Things just don't work like that.

Not to mention that the cutting edge in many sciences are perhaps two-three research groups of 5-30 individuals each in varying research institutions around the world.

ramesh31 · on Aug 6, 2023

> Even with explicit instructions, it is hard enough to even get the same code to run in a different environment and give the same results.

Is it really that hard for researchers to standardize around providing Dockerfiles? Environment replication is a solved problem.

titzer · on Aug 7, 2023

> Environment replication is a solved problem.

Unfortunately, no. Dockerfiles aren't as portable as you think, and not architecture-independent. VMs are better, but even then, performance isn't portable either.

The last artifact I produced included builds of 3 web browsers from source--it was over 10GB. One doesn't just "build Chrome in a dockerfile".