Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For that GPU the best Gemma 3 model you'll be able to run (with GPU-only inference) is 4-bit quantized 12b parameter model: https://ollama.com/library/gemma3:12b

You could use CPU for some of the layers, and use the 4-bit 27b model, but inference would be much slower.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: