I'm running the same setup - our larger builders have 2 32-core epycs with 2TB RAM. We were doing that type of setup already almost two decades ago in a different company, and in that one for over a decade now - back then that was the only option for speed.
Nowadays nvmes might indeed be able to get close - but we'd probably need to still span over multiple SSDs (reducing the cost savings), and the developers there are incredible sensitive to build times. If a 5 minute build suddenly takes 30 seconds more we have some unhappy developers.
Another reason is that it'd eat SSDs like candy. Current enterprise SSDs have something like a 10000 TBW rating, which we'd exceed in the first month. So we'd either get cheap consumer SSDs and replace them every few days, or enterprise SSDs and replace them every few months - or stick with the RAM setup, which over the live of the build system will be cheaper than constantly buying SSDs.
> Current enterprise SSDs have something like a 10000 TBW rating
Running the numbers to verify: a read-write-mixed enterprise SSD will typically have 3 DWPD (drive writes per day), across it's 5 year warranty. At 2TB, that would be 10950 TBW, so that sort of checks out. If endurance was a concern, upgrading to a higher capacity would linearly increase the endurance. For example the Kioxia CD8P-V. https://americas.kioxia.com/en-us/business/ssd/data-center-s...
Finding it a bit hard to imagine build machines working that hard, but I could believe it!
Same as the one earlier in the thread: Build servers, nicely loaded. A build generates a ridiculous amount of writes for stuff that just gets thrown out after the build.
We actually did try with SSDs about 15 years ago, and had a lot of dead SSDs in a very short time. After that we went for estimating data written, it's cheaper. While SSD durability increased a lot since then everything else got faster as well - so we'd have SSDs last a bit longer now (back then it was a weekly thing), but still nowhere near where it'd be a sensible thing to do.
You don't really want that. I'm keeping my sanity there just because my small company is running their CI and testing as contractor.
They indeed are quite spoiled - and that's not necessarily a good thing. Part of the issue is that our CI was good and fast enough that at some point a lot of the new hires never bothered to figure out how to build the code - so for quite a few the workflow is "commit to a branch, push it, wait for CI, repeat". And as they often just work on a single problem the "wait" is time lost for them, which leads to the unhappiness if we are too slow.
Nowadays nvmes might indeed be able to get close - but we'd probably need to still span over multiple SSDs (reducing the cost savings), and the developers there are incredible sensitive to build times. If a 5 minute build suddenly takes 30 seconds more we have some unhappy developers.
Another reason is that it'd eat SSDs like candy. Current enterprise SSDs have something like a 10000 TBW rating, which we'd exceed in the first month. So we'd either get cheap consumer SSDs and replace them every few days, or enterprise SSDs and replace them every few months - or stick with the RAM setup, which over the live of the build system will be cheaper than constantly buying SSDs.