I’m not so sure it’s negligible. My anecdotal experience is that since Apple Silicon chips were found to be “ok” enough to run inference with MLX, more non-technical people in my circle have asked me how they can run LLMs on their macs.
Surely a smaller market than gamers or datacenters for sure.
It's annoying I do LLMs for work and have a bit of an interest in them and doing stuff with GANS etc.
I have a bit of an interest in games too.
If I could get one platform for both, I could justify 2k maybe a bit more.
I can't justify that for just one half: running games on Mac, right now via Linux: no thanks.
And on the PC side, nvidia consumer cards only go to 24gb which is a bit limiting for LLMs, while being very expensive - I only play games every few months.
The new $2k card from Nvidia will be 32GB but your point stands. AMD is planning a unified chiplet based GPU architecture (AI/data center/workstation/gaming) called UDNA, which might alleviate some of these issues. It's been delayed and delayed though - hence the lackluster GPU offerings from team Red this cycle - so I haven't been getting my hopes up.
Maybe (LP)CAMM2 memory will make model usage just cheap enough that I can have a hosting server for it and do my usual midrange gaming GPU thing before then.
I mean negligible to their bottom line. There may be tons of units bought or not, but the margin on a single datacenter system would buy tens of these.
It’s purely an ecosystem play imho. It benefits the kind of people who will go on to make potentially cool things and will stay loyal.
> It’s purely an ecosystem play imho. It benefits the kind of people who will go on to make potentially cool things and will stay loyal.
It will be massive for research labs. Most academics have to jump through a lot of hoops to get to play with not just CUDA, but also GPUDirect/RDMA/Infiniband etc. If you get older/donated hardware, you may have a large cluster but not newer features.
They have, because until now Apple Silicon was the only practical way for many to work with larger models at home because they can be configured with 64-192GB of unified memory. Even the laptops can be configured with up to 128GB of unified memory.
Performance is not amazing (roughly 4060 level, I think?) but in many ways it was the only game in town unless you were willing and able to build a multi-3090/4090 rig.
I'm currently wondering how likely it is I'll get into deeper LLM usage, and therefore how much Apple Silicon I need (because I'm addicted to macOS). So I'm some way closer to your steel man than you'd expect. But I'm probably a niche within a niche.
Doubt it, a year ago useful local LLMs on a Mac (via something like ollama) was barely taking off.
If what you say it's true you were among the first 100 people on the planet who were doing this; which btw, further supports my argument on how extremely rare is that use case for Mac users.
People were running llama.cpp on Mac laptops in March 2023 and Llama2 was released in July 2023. People were buying Macs to run LLMs months before M3 machines became available in November 2023.
Surely a smaller market than gamers or datacenters for sure.