Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Within the GGUF (and some other formats) you'll see each layer gets its own quantisation, for example embeddings layers are usually more sensitive to quantisation and as such are often kept at Q8 or FP16. If you run GGUF-dump or click on the GGUF icon on a model in huggingface you'll see.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: