Latency is significantly impacted by the physical distance between RAM and CPU’s. Which combined with modern cache sizes means they get diminishing returns trying to minimize it and thus make different tradeoffs.
Core-to-RAM latency is in the neighborhood of 50 ns (well-tuned Intel system with low-latency memory) to ~80 ns (bottom-of-the-barrel system). At propagation speed, that's about 10 meters. A big chunk of this latency is internal to the CPU (so is not influenced by distance to the memory at all), another big chunk is the inherent slowness of accessing a DRAM array (10+ ns, independent of the location of the memory).
It's worth pointing out how little this has changed over the past decades. A 2006 AMD CPU is 100 % competitive in regards to memory latency with Intel's 2020 flagship desktop CPU.
10 meters one way = 5 meters round trip. Trace the longest physical path a signal travels from your CPU to a memory chip on the DIMM and back, it’s likely longer than you think. And yea on it’s own plenty of overhead, but everyone making latency tradeoffs bashing their design as part of a larger system with a single unavoidable limit.