Apple Silicon Macs might not have great GPUs but they do have unified memory. I ...

diffeomorphism · on July 26, 2023

What does "unified" actually mean and how much would that help? It is still off the shelf LPDDR5‑6400, just with a better interconnnect (like a ps5).

How does this compare to non-unified ddr5 or hmb2e as on nvidia A100 cards?

lhl · on July 26, 2023

The benefits are primarily price - 96GB of VRAM would be 4x3090/4090 (~$6K) or 2xA6000 (~$8-14K) cards (also, looks like you can buy an 80GB A100 PCIe for about $15K atm). While Apple is using LPDDR5, it is also running a lot more channels than comparable PC hardware. The M2 has 100GB/s, M2 Pro 200GB/s, M2 Max 400GB/s, and M2 Ultra is 800GB/s (8 channel) of memory bandwidth. The Nvidia cards are about 900GB/s-1TB/s (A100 PCIe gets up to 1.5TB/s).

In practice, on quantizes of the larger open LLMs, an M2 Ultra can currently inference about 2-4X faster than the best PC CPUs I've seen (mega Epyc systems), but also about 2-4X slower than 2x4090s.

diffeomorphism · on July 26, 2023

That is useful info, but still does not quite address the question.

The question was how memory type, memory amount and bandwidth factor into actual performance. So let me rephrase: Given a budget of $X, what performance/limitations should you expect with

- 256GB of non-unified DDR5 in a PC, just CPU

- 128GB of DDR5 for an APU

- 96GB of unified DDR5

- Whatever Nvidia will sell you for $X.

An answer of "just compare a single memory bandwidth number" seems a bit short. Sure, more bandwidth helps, but is half as much RAM at double bandwidth better or worse?

ErneX · on July 26, 2023

No idea, I just said I wanted to try this out and see how it performs.

Doesn’t VRAM amount limit the size of the model you can load? I’m not talking about training just inference. I also pointed out these are not the greatest GPUs available, just that the advantage they have is being able to address more memory since on those machines is a shared block between system and GPU.

thejosh · on July 26, 2023

It's a term used to justify non replaceable parts :-).