Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> There are some pretty powerful models that will run on a Nvidia 4090 w/ 24gb of RAM. Devstral and Queen 3.

I'd caution against using devstral on a 24 gb vram budget. Heavy quantisation (the only way to make it fit into 24gb) will affect it a lot. Lots of reports on locallama about subpar results, especially from kv cache quant.

We've had good experiences with running it fp8 and full cache, but going lower than that will impact the quality a lot.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: