My understanding is that the performance is based around Apple silicon being optimized for the NeXT/ObjC model of retain/release, which they got down from 30 nanoseconds on Intel to 6.5ns on M1 (14ns on emulated Intel).
IMO 8gb is still barely acceptable for most consumers in 2024, but not for any laptop that you'd expect to get more than say two years of usefulness out of, so… yeah it's about time.
I’m not exactly an expert on this, but I do know from experience that Objective-C programming (and anything coming out of an ARC compiler, like Swift) is a constant game of grabbing memory, using it, and releasing it. My impression is that if you can release memory faster, that means greater availability in the pool.
Guesses aside, I spent a lot of time in Xcode on an 8gb M1 while waiting for the M1 Pro to come out three years ago. It was great, much better performing than the 16gb Intel it was replacing. I still don’t think 8gb is really a big deal for most people, getting more is just important for speculative future needs.
The behavior you describe would be a function of the memory allocator in use (system allocator, custom allocator) and independent of the hardware. It's at a much, much higher level than the hardware or even the page compression.
At the risk that I’ve missed a joke, the unified memory architecture reduces the amount of RAM you have available since “RAM” for any computer with discrete graphics doesn’t include VRAM.
I think they are saying it from the perspective rather than two separate banks which are rarely both full it's one bank which is only ever full when 100% of the hardware is full. That said I think many miss that's how integrated graphics memory management had been working on x86 systems for many many years already. I.e. the "dedicated RAM" slider in the boot firmware is a legacy holdover that should be set to a minimum token value, not something which determines the limit the iGPU has access to. macOS also works this way, there is a token amount of the unified memory reserved for the iGPU still, you just can't adjust that amount higher since there is no legacy holdover for it to make sense to do so.
They are saying it from the perspective of the RAM being on the same die as the CPU. This is one of the innovations of the Apple Silicon architecture as it SIGNIFICANTLY reduces memory access latency.
It's not just TSMC's 3nm process. It's also Apple engineering.
This engineering was common in x86 CPUs by 2013 when AMD introduced Heterogeneous System Architecture which utilized Heterogeneous Uniform Memory Access. The approach has its upsides and downsides, the latter generally being a unified bus tends to have much lower overall bandwidth (even in the Max) and runtime scalability issues. Upsides are more obvious for the types of systems people want APUs for in the first place though so that's usually fine.
The main bit of engineering Apple should be lauded for in the memory department is the gumption to throw the hundreds of GB/s at the mid and high end models.
HSA was not “unified” in the modern sense. It still required designating memory as gpu or cpu side and these implied different cache coherency rules that meant memory couldn’t actually be shared, by default. To actually share memory you had to use a special “garlic bus” that guaranteed visibility and ordering and massively slowed down performance. Similarly, it was also impossible for the gpu to see cpu memory unless it was pinned and tagged for a special “onion bus”, but at least this was relatively fast iirc.
In contrast apple actually has everything tied into a single unified space with a single controller that immediately makes all writes visible regardless of where the happen.
They’ve also got enormously more memory bandwidth to play with. M1 Max is close to PS5 in both shader configuration and memory bandwidth.
Generally it's more about the number of chips you need. If you can get to x GB of RAM with the same number of memory chips, just higher capacities, then the power difference is truly quite miniscule. If you have to double up chips to reach the capacity then you start drawing more power (though still on the order of a couple Watts max difference at those sizes). Even then it's not always a constant couple Watts, RAM+memory controllers uses less power when you're not actively writing data.