Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I mean... yes. The DeepSeek announcement puts R1 right there in the name for those models. https://api-docs.deepseek.com/news/news250120

It's fairly clear that R1-Llama or R1-Qwen is a distill, and they're all coming directly from DeepSeek.

As an aside, at least the larger distilled models (I'm mostly running r1-llama-distill-70b) are definitely not the same thing as the base llama/qwen models. I'm getting better results locally, admittedly with the slower inference time as it does the whole "<think>" section.

Surprisingly - The content in the <think> section is actually quite useful on its own. If you're using the model to spitball or brainstorm, getting to see it do that process is just flat out useful. Sometimes more-so than the actual answer it finally produces.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: