I'm impressed by not only how good deepseek r1 is, but also how good the smaller...

magicalhippo · on Jan 25, 2025

I just tries the distilled 8b Llama variant, and it had very poor prompt adherence.

It also reasoned its way to an incorrect answer, to a question plain Llama 3.1 8b got fairly correct.

So far not impressed, but will play with the qwen ones tomorrow.

singularity2001 · on Jan 26, 2025

not adhering to system prompts is even officially mentioned as one of the caveats of the distilled models

I wonder if this has to do with their censorship agenda but other report that it can be easily circumvented

magicalhippo · on Jan 26, 2025

I didn't have time to dig into the details of the models, but that makes sense I guess.

I tried the Qwen 7B variant and it was indeed much better than the base Qwen 7B model at various math word problems.

OCHackr · on Jan 25, 2025

How much VRAM is needed for the 32B distillation?

brandall10 · on Jan 25, 2025

Depends on the quant used and the context size. On a 24gb card you should be able to load about a 5 bit if you keep the context small.

In general, if you're using 8bit which is virtually lossless, any dense model will require roughly the same amount as the number of params w/ a small context, and a bit more as you increase context.

jadbox · on Jan 25, 2025

Depends on compression, I think 24gb can hold a 32B at around 3b-4b compression.

buyucu · on Jan 25, 2025

I had no problems running the 32b at q4 quantization with 24GB of ram.

ideashower · on Jan 27, 2025

can I ask, what do you do with it on your home server?

ThouYS · on Jan 25, 2025

tried the 7b, it switched to chinese mid-response

popinman322 · on Jan 25, 2025

Assuming you're doing local inference, have you tried setting a token filter on the model?

brookst · on Jan 25, 2025

Great as long as you’re not interested in Tiananmen Square or the Uighurs.

buyucu · on Jan 26, 2025

I just tried asking ChatGPT how many civilians Israel murdered in Gaza. It didn't answer.

przemub · on Jan 26, 2025

A is wrong but that’s fine because B also is.

dd36 · on Jan 27, 2025

Does Israel make ChatGPT?

lurking_swe · on Jan 25, 2025

i can’t think of a single commercial use case, outside of education, where that’s even relevant. But i agree it’s messed up from an ethical / moral perspective.

brookst · on Jan 26, 2025

Well those are the overt political biases. Would you trust DeepSeek to advise on negotiating with a Chinese business?

I’m no xenophobe, but seeing the internal reasoning of DeepSeek explicitly planning to ensure alignment with the government give me pause.

lurking_swe · on Jan 26, 2025

i wouldn’t use AI for negotiating with a business period. I’d hire a professional human that has real hands on experience working with chinese businesses?

seems like a weird thing to use AI for, regardless of who created the model.

brookst · on Jan 26, 2025

Interesting. I want my AI tools to be suitable for any kind of brainstorming or iteration.

But yeah if you’re scoping your uses to things where you’re sure a government-controlled LLM won’t bias results, it should be fine.

lurking_swe · on Jan 28, 2025

Yeah i can definitely see some situations where i’d be wary, i agree with you. Wouldn’t impact my work but i see the concern.

I use LLM’s for technical solution brainstorming, rubber-ducking technical problems, and learning (software languages, devops, software design, etc.)

Your mileage will vary of course!

whimsicalism · on Jan 25, 2025

american models have their own bugbears like around evolution and intellectual property

miohtama · on Jan 25, 2025

For sensitive topics, it is good that we canknow cross ask Grok, DeepSeek and ChatGPT to avoid any kind of biases or no-reply answers.

semicolon_storm · on Jan 26, 2025

The censorship is not present in the distilled models which you can run locally

thomas34298 · on Jan 25, 2025

Have you even tried it out locally and asked about those things?

brookst · on Jan 26, 2025

https://sherwood.news/tech/a-free-powerful-chinese-ai-model-...

whimsicalism · on Jan 26, 2025

so, no

slt2021 · on Jan 25, 2025

try asking US models about the influence of Israeli diaspora on funding genocide in Gaza then come back

brookst · on Jan 26, 2025

Which American models? Are you suggesting the US government exercises control over US LLM models the way the CCP controls DeepSeek outputs?

whimsicalism · on Jan 26, 2025

i think both American and Chinese model censorship is done by private actors out of fear of external repercussion, not because it is explicitly mandated to them

brookst · on Jan 26, 2025

Oh wow.

Sorry, no. DeepSeek’s reasoning outputs specifically say things like “ensuring compliance with government viewpoints”

buyucu · on Jan 26, 2025

meta just replaced its public policiy officer to pander to the new administration. american companies work hard to align with american government.

mmoskal · on Jan 26, 2025

CCP requires models to follow "socialist values".

https://www.cnbc.com/amp/2024/07/18/chinese-regulators-begin...

sunaookami · on Jan 26, 2025

And the EU requires models to follow "democractic liberal values" according to their AI act. Other side of the same coin.

mmoskal · on Jan 26, 2025

Luckily in the US the govt can do no such things due to the 1st amendment, so it only takes a relevant billionaire to get a model with different political views.

buyucu · on Jan 26, 2025

One of Meta's policy officials (Jordana Cutler) is a former Israeli government official who was censoring anti-genocide content online.

American models are full of censorship. Just different stuff.