Interesting. How does one get this to run on a M-series Macbook Pro?

schmeichel · on Nov 25, 2023

I'd recommended using llama.cpp and The Bloke's GGUF version of this model!

https://github.com/ggerganov/llama.cpp/ https://huggingface.co/TheBloke/MonadGPT-GGUF

Dorialexander · on Nov 25, 2023

Hi. TheBloke has quantized the model: https://huggingface.co/TheBloke/MonadGPT-GGUF You may be able to run the Q3 or Q4 variant. Although in my experience, the quality of quantization takes a hit on "weirder" data (which is the case here)

SushiHippie · on Nov 26, 2023

As the model is very small you should be able to run any quantization level on a M-Series macbook with at least 16GB of ram. The best one speed/quality wise will probably be Q6_K. As it has not much difference in quality with Q8, but will be definitely faster than Q8.

Haven't tried this one specifically but I always run the 7B parameter models on a M2 Pro with Q6_K or Q4_K_M (depending on how fast I want it).

See also this table in the readme, which states that Q8 only needs ~10GB of RAM: https://huggingface.co/TheBloke/MonadGPT-GGUF?text=Hey+my+na...