I think the point is that laptops are more limited than other form factors. I’m reading it as a response to the comment that MacBooks are computers optimized for ai and only technically a laptop (which is a pretty ridiculous statement imo). Apples architecture happens to be very good at a lot of compute heavy tasks, especially where total available GPU ram and low latency handoff between the CPU and the gpu are concerned. This happens to be very well suited to LLM workloads.
Apple m3/m4 silicon is certainly good in some ways, but the bottleneck is often a lack of CUDA software support and price (could buy >4 times the GPU raw performance on a dual rtx 5090 desktop.) =3
The key features of the m3 ultra is 512GB of shared GPU/CPU ram, and ultra fast LAN over peripheral cabling.
Once an NVIDIA card caches a model into its VRAM, than it doesn't get hit with the memory data copy cost over the bus.
Yet as many people have noticed, who cares if the m3 ultra takes four times as long if the faster alternative simply won't fit the larger models. YMMV =3