This is from a small model. 32B and 70B answer this correctly. "Arrowroot" too. ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ein0p 6 months ago \| parent \| context \| favorite \| on: DeepSeek-R1 This is from a small model. 32B and 70B answer this correctly. "Arrowroot" too. Interestingly, 32B's "thinking" is a lot shorter and it seems to be more "sure". Could be because it's based on Qwen rather than LLaMA.

cbo100 6 months ago [–]

I get the right answer on the 8B model too.

It could be the quantized version failing?

ein0p 6 months ago | [–]

My models are both 4 bit. But yeah, that could be - small models are much worse at tolerating quantization. That's why people use LoRA to recover the accuracy somewhat even if they don't need domain adaptation.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact