Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm impressed by not only how good deepseek r1 is, but also how good the smaller distillations are. qwen-based 7b distillation of deepseek r1 is a great model too.

the 32b distillation just became the default model for my home server.



I just tries the distilled 8b Llama variant, and it had very poor prompt adherence.

It also reasoned its way to an incorrect answer, to a question plain Llama 3.1 8b got fairly correct.

So far not impressed, but will play with the qwen ones tomorrow.


not adhering to system prompts is even officially mentioned as one of the caveats of the distilled models

I wonder if this has to do with their censorship agenda but other report that it can be easily circumvented


I didn't have time to dig into the details of the models, but that makes sense I guess.

I tried the Qwen 7B variant and it was indeed much better than the base Qwen 7B model at various math word problems.


How much VRAM is needed for the 32B distillation?


Depends on the quant used and the context size. On a 24gb card you should be able to load about a 5 bit if you keep the context small.

In general, if you're using 8bit which is virtually lossless, any dense model will require roughly the same amount as the number of params w/ a small context, and a bit more as you increase context.


Depends on compression, I think 24gb can hold a 32B at around 3b-4b compression.


I had no problems running the 32b at q4 quantization with 24GB of ram.


can I ask, what do you do with it on your home server?


tried the 7b, it switched to chinese mid-response


Assuming you're doing local inference, have you tried setting a token filter on the model?


Great as long as you’re not interested in Tiananmen Square or the Uighurs.


I just tried asking ChatGPT how many civilians Israel murdered in Gaza. It didn't answer.


A is wrong but that’s fine because B also is.


Does Israel make ChatGPT?


i can’t think of a single commercial use case, outside of education, where that’s even relevant. But i agree it’s messed up from an ethical / moral perspective.


Well those are the overt political biases. Would you trust DeepSeek to advise on negotiating with a Chinese business?

I’m no xenophobe, but seeing the internal reasoning of DeepSeek explicitly planning to ensure alignment with the government give me pause.


i wouldn’t use AI for negotiating with a business period. I’d hire a professional human that has real hands on experience working with chinese businesses?

seems like a weird thing to use AI for, regardless of who created the model.


Interesting. I want my AI tools to be suitable for any kind of brainstorming or iteration.

But yeah if you’re scoping your uses to things where you’re sure a government-controlled LLM won’t bias results, it should be fine.


Yeah i can definitely see some situations where i’d be wary, i agree with you. Wouldn’t impact my work but i see the concern.

I use LLM’s for technical solution brainstorming, rubber-ducking technical problems, and learning (software languages, devops, software design, etc.)

Your mileage will vary of course!


american models have their own bugbears like around evolution and intellectual property


For sensitive topics, it is good that we canknow cross ask Grok, DeepSeek and ChatGPT to avoid any kind of biases or no-reply answers.


The censorship is not present in the distilled models which you can run locally


Have you even tried it out locally and asked about those things?



so, no


try asking US models about the influence of Israeli diaspora on funding genocide in Gaza then come back


Which American models? Are you suggesting the US government exercises control over US LLM models the way the CCP controls DeepSeek outputs?


i think both American and Chinese model censorship is done by private actors out of fear of external repercussion, not because it is explicitly mandated to them


Oh wow.

Sorry, no. DeepSeek’s reasoning outputs specifically say things like “ensuring compliance with government viewpoints”


meta just replaced its public policiy officer to pander to the new administration. american companies work hard to align with american government.


CCP requires models to follow "socialist values".

https://www.cnbc.com/amp/2024/07/18/chinese-regulators-begin...


And the EU requires models to follow "democractic liberal values" according to their AI act. Other side of the same coin.


Luckily in the US the govt can do no such things due to the 1st amendment, so it only takes a relevant billionaire to get a model with different political views.


One of Meta's policy officials (Jordana Cutler) is a former Israeli government official who was censoring anti-genocide content online.

American models are full of censorship. Just different stuff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: