Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It sounds like we are in a similar position. I had no desire to get a 64gb laptop from apple until all the interesting things from running llama locally came out. I wasn't even aware of the specific benefit of that uniform memory model on the mac. Now I'm looking at do I want to do 64, 96 or 128gb. For an insane amount of money, 5k for that top end one.


The unified memory ought to be great for running LLaMA on the GPU on these Macbooks (since it can't run on the Neural Engine currently)

The point of llama.cpp is most people don't have a GPU with enough RAM, Apple unified memory ought to solve that

Some people have it working apparently:

https://github.com/remixer-dec/llama-mps


Thank you, that's exactly what I was looking for, specific info on perf.


I think the GPU performance for inference is probably limited currently by immaturity of PyTorch MPS (Metal) backend

before I found the repo above I had a naive attempt to get llama running with mps and it didn't "just work" - bunch of ops not supported etc




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: