Does that entire model fit in gpu memory? How's it run?
I tried running a model larger than ram size and it loads some layers into the gpu but offloads to the cpu also. It's faster than cpu alone for me, but not by a lot.
Nice, last time I tried out ROCm on Arch a few years ago it was a nightmare. Glad to see it's just one package install away these days, assuming you didn't do any setup beforehand.
1. sudo pacman -S ollama-rocm
2. ollama serve
3. ollama run deepseek-r1:32b