You are correct. This project is "on the CPU", so it will not utilize your GPU for computation. If you would like to try out a Rust framework that does support GPUs, Candle https://github.com/huggingface/candle/tree/main may be worth exploring
It's all implemented on the CPU, yes, there's no GPU acceleration whatsoever (at the moment at least).
> if I have a good GPU, I should look for alternatives.
If you actually want to run it, even just on the CPU, you should look for an alternative (and the alternative is called llama.cpp) this is more of an educational resource about how things work when you remove all the layers of complexity in the ecosystem.
LLM are somewhat magic in how effective they can be, but in terms of code it's really simple.
For rust you have the llama.cpp wrappers like llm_client (mine), and the candle based projects mistral.rs, and Kalosm.
Although, my project does try and provide an implementation of mistral.rs, I haven’t fully migrated from llama.cpp. A full rust implementation would be nice for quick install times (among other reasons). Right now my crate has to clone and build. It’s automated for mac, pc, and Linux but it adds about a minute of build time.
An RTX 3090 (as one example) has nearly 1TB/s of memory bandwidth. You'd need at least 12 channels of the fastest proof-of-concept DDR5 on the planet to equal that.
If you have a discrete GPU, use an implementation that utilizes it because it's a completely different story.
Apple Silicon boasts impressive numbers on LLM inference because it has a unified CPU-GPU high-bandwidth (400GB/s IIRC) memory architecture.
Depends. Good models are big, and require a lot of memory. Even the 4090 doesn't have that much memory in an LLM context. So your GPU will be faster, but likely can't fit the big models.
I am building something that would probably benefit from this, but with that price tag (solo indy dev) that's going to be a big ask! might be worth it, just no way of knowing without trying it first
Theirs is a commercial project with MIT client. I made an open-source version of their project. (Their backend is closed source. I have no knowledge of it.)