The easy way: download koboldcpp. Otherwise you have to compile llama.cpp (or ko...

The easy way: download koboldcpp. Otherwise you have to compile llama.cpp (or kobold.cpp) with opencl or cuda support. There are instructions for this on the git page.

Then offload as many layers as you can to the gpu with the gpu layers flag. You will have to play with this and observe your gpu's vram.