Six months ago I had almost given up on local LLMs - they were fun to try but they were so much less useful than Sonnet 3.5 / GPT-4o that it was hard to justify using them.
That's changed in the past two months. Llama 3 70B, Qwen 32B and now these R1 models are really impressive, to the point that I'm considering trying to get real work done with them.
The catch is RAM: I have 64GB, but loading up a current GPT-4 class model uses up around 40GB of that - which doesn't leave much for me to run Firefox and VS Code.
So I'm still not likely to use them on a daily basis - but it does make me wonder if I should keep this laptop around as a dedicated server next time I upgrade.
One reason why I'm asking is that I'm in the market for a new laptop and am wondering whether it's worth spending more for the possible benefits of being able to run ~30-40GB local LLMs.
Unfortunately it doesn't look as if the answer is either "ha ha, obviously not" or "yes, obviously". (If the question were only about models available right now I think the answer would be no, but it seems like they're close enough to being useful that I'm reluctant to bet on them not being clearly genuinely useful a year from now.)
Yeah, it's not an obvious answer at all. Spending ~$3,000+ on a laptop to run local models is only economically sensible if you are VERY paranoid about using APIs (there are plenty of API providers that I personally trust not to train on my data) - otherwise that $3,000 will buy you many years worth of access to the best available models via API.
Well, I unfortunately have expensive tastes in laptops anyway, so the delta is substantially less than $3k, and it's possible that from time to time I'll run across other things that benefit from the fancier machine, and if I don't get a 64GB Mac one of the other possibilities is a 48GB Mac which would still be able to run some local LLMs. But, all that said, it's still potentially a sizable chunk of money for a dubious benefit.
I've been assuming that privacy isn't the only benefit of local; it seems like a local model would offer more flexibility for fine-tuning, RAG, etc., though I am completely ignorant of e.g. what size of model it's actually feasible to do any useful fine-tuning to on given hardware.
If you are worried about security or IP at all, it's preferable to run locally, or spin up your own box that you can query running one of these models.
I understand the reasons for not wanting to use a remote LLM. My question was about how useful local LLMs are. It might turn out that for some people remote LLMs are unacceptable for privacy reasons and local LLMs are unacceptable because they aren't good enough to be useful.
(This is a serious question, not poking fun; I am actually curious about this.)