Yes, you can run inference at decent speeds on CPU with llama.cpp. A token is ab...

Yes, you can run inference at decent speeds on CPU with llama.cpp. A token is about 0.75/words, so you can see lots of people getting 4-8 words/s on their CPUs: https://github.com/ggerganov/llama.cpp/issues/34

There a lot of optimizations that can be done. Here's one w/ potentially a 15X AVX speedup for example: https://github.com/ggerganov/llama.cpp/pull/996