oh? can you point out where i can get the r1 model to run locally, please? because looking at the directory here there's a 200B model and then deepseek v3 is the latest (16 days ago) with no GGUF (yet), and everything else is intruct or coder.
so to put it another way, the people telling me i'm holding it wrong actually don't have any clue what they're asking for?
p.s. there is no "local r1" so you gotta do a distill.
we were talking about self-hosting. the deepseek-r1 is 347-713MB depending on quant. No one is running deepseek-r1 "locally, self hosted".
If people want to argue with me, i wish we'd all stick to what we're talking about, instead of saying "but you technically can if you use someone else's hardware" but that's not self hosted. I self host a deepseek-r1 distill, locally, on my computer.
It is deepseek, it's just been hand-distilled by someone using a different tool. the deepseek-r1 will get chopped down by 1/8th and it won't be called "deepseek-r1 - that's what they call a "foundational model", and then we'll see the 70B and the 30 and the 16 "deepseek deepseek distills"
next to no one who messes with this stuff uses foundational or distilled foundational models. Who's still using llama-3.2? Yeah, it's good, it's fine, but there's mixes and MoE and CoT that use llama as the base model, and they're better.
there is no gguf for running locally, self-hosted. Yes, if you have a DC card you can download the weights and run something but that's different than self-hosting local running with a 30B (for example).
I don't really understand what's different between self-hosting using Ollama vs self-hosting by running the full weights. I get that Ollama is easier, but you can still self-host the full one?
so to put it another way, the people telling me i'm holding it wrong actually don't have any clue what they're asking for?
p.s. there is no "local r1" so you gotta do a distill.