This is cool - time travel debugging is potentially really helpful in these flaky situations. Having tests fail unpredictably basically means you're not controlling the thing they test, which can be scary.
But I find the most annoying failures in CI are the ones I can't reproduce any other way and somehow always happen when I'm not trying.
Run locally? Fine.
Run on a cloud machine that's identical to the CI system. Fine.
Run multiple instances of the test on the cloud machine to generate more load. Fine.
Run in the overnight tests - blam.
This doesn't always make sense, even once I've found the bug - sometimes the timing just shakes out that way.
Sometimes we record stubborn tests that are acting weird, so we're ready when they next fail.
I never get tired of funny misreads of product names. I thought "our own busted framework" was self-deprecation until it showed up in slab serif on the next line.
This is a good cautionary tale for folks jumping on cheap ARM cloud instances. Different architectures mean different bugs, and depending on how your infra is provisioned, could make reproducing bugs even harder.
https://rr-project.org/