I'm impressed by not only how good deepseek r1 is, but also how good the smaller distillations are. qwen-based 7b distillation of deepseek r1 is a great model too.
the 32b distillation just became the default model for my home server.
Depends on the quant used and the context size. On a 24gb card you should be able to load about a 5 bit if you keep the context small.
In general, if you're using 8bit which is virtually lossless, any dense model will require roughly the same amount as the number of params w/ a small context, and a bit more as you increase context.
i can’t think of a single commercial use case, outside of education, where that’s even relevant. But i agree it’s messed up from an ethical / moral perspective.
i wouldn’t use AI for negotiating with a business period. I’d hire a professional human that has real hands on experience working with chinese businesses?
seems like a weird thing to use AI for, regardless of who created the model.
i think both American and Chinese model censorship is done by private actors out of fear of external repercussion, not because it is explicitly mandated to them
Luckily in the US the govt can do no such things due to the 1st amendment, so it only takes a relevant billionaire to get a model with different political views.
the 32b distillation just became the default model for my home server.