RAM signalization is very sensitive to data line length, more so if you use low power RAM modules.
This is not a "bigger number in a useless benchmark" thing, if you can standardize the shortest and fastest routing possible with a socket, everybody can use faster RAM for less (RAM command waits are the biggest killer of responsiveness), and everyone can enjoy faster, more responsive systems.
Moreover, bigger RAM means bigger caches in any OS, and having these caches on faster storage means snappier, faster systems. Lastly, games and other engineering software use great amounts of RAM and moves tons of data between GPU RAM and system RAM. Having faster pipes helps.
It's not about benchmarks, it's about real world performance.
The last one is a 6GHz, 24 core, 32 thread, 253W behemoth, and even that has two memory channels.
What's the benefit of having four slots, and adding more wait states and reducing bandwidth in a system this powerful? That doesn't make sense.
Instead, I'd rather have two channels in a module, and change the whole RAM in one swoop. If you are unsure, over-speccing a system slightly at the start won't hurt on the long run.
> using swap is the biggest killer of responsiveness
I have bad news for you, then. Even if you have tons of free RAM, unused pages are still pushed to your swap. On the other hand, while swap heavy computation is a last-resort in high load systems, command waits are with you the moment you touch your power button.
> On Windows? Pretty sure that both on Mac OS and Linux I can have the swap mostly unused.
Nope, on Linux. My macOS system is not using its swap at the moment, but I saw it's using it when both memory pressure was low and there was free space. OTOH, on my 32GB Linux system, current state is as follows:
total used free shared buff/cache available
Mem: 31894 10051 2967 169 19498 21843
Swap: 15722 2506 13216
This is system has not been stressed to full RAM since last boot. It's Linux kernel's own choice to move these pages to swap over time.
> Or even run without swap on Linux.
This is how Kubernetes systems and most HPC clusters run, and hard-lock the moment you hit 0 on "free" column.
It's not really a vendor thing but signal integrity issue.
CAMM2 just makes a lot more sense with DDR5. And fact is that we most likely want our next standard to be even faster so the issue is just going to get worse with DDR6 or whatever the next one we are going for is.
Ah, recognizes that we customers are being fucked over by the vendors chase for bigger numbers in benchmarks. Got it, thanks.
Edit: based on all the explanations I got, DDR5 reminds me of ... the Pentium 4.