Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting. How does one get this to run on a M-series Macbook Pro?


I'd recommended using llama.cpp and The Bloke's GGUF version of this model!

https://github.com/ggerganov/llama.cpp/ https://huggingface.co/TheBloke/MonadGPT-GGUF


Hi. TheBloke has quantized the model: https://huggingface.co/TheBloke/MonadGPT-GGUF You may be able to run the Q3 or Q4 variant. Although in my experience, the quality of quantization takes a hit on "weirder" data (which is the case here)


As the model is very small you should be able to run any quantization level on a M-Series macbook with at least 16GB of ram. The best one speed/quality wise will probably be Q6_K. As it has not much difference in quality with Q8, but will be definitely faster than Q8.

Haven't tried this one specifically but I always run the 7B parameter models on a M2 Pro with Q6_K or Q4_K_M (depending on how fast I want it).

See also this table in the readme, which states that Q8 only needs ~10GB of RAM: https://huggingface.co/TheBloke/MonadGPT-GGUF?text=Hey+my+na...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: