Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL.

We've been running qualitative experiments on OpenAI o1 and QwQ-32B-Preview [1]. In those experiments, I'd say there were two primary things going against QwQ. First, QwQ went into endless repetitive loops, "thinking out loud" what it said earlier maybe with a minor modification. We had to stop the model when that happened; and I feel that it significantly hurt the user experience.

It's great that DeepSeek-R1 fixes that.

The other thing was that o1 had access to many more answer / search strategies. For example, if you asked o1 to summarize a long email, it would just summarize the email. QwQ reasoned about why I asked it to summarize the email. Or, on hard math questions, o1 could employ more search strategies than QwQ. I'm curious how DeepSeek-R1 will fare in that regard.

Either way, I'm super excited that DeepSeek-R1 comes with an MIT license. This will notably increase how many people can evaluate advanced reasoning models.

[1] https://github.com/ubicloud/ubicloud/discussions/2608



The R1 GitHub repo is way more exciting than I had thought.

They aren't only open sourcing R1 as an advanced reasoning model. They are also introducing a pipeline to "teach" existing models how to reason and align with human preferences. [2] On top of that, they fine-tuned Llama and Qwen models that use this pipeline; and they are also open sourcing the fine-tuned models. [3]

This is *three separate announcements* bundled as one. There's a lot to digest here. Are there any AI practitioners, who could share more about these announcements?

[2] We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models.

[3] Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.


Where are you seeing this? On https://github.com/deepseek-ai/DeepSeek-R1/tree/main?tab=rea... I only see the paper and related figures.


I see it in the "2. Model Summary" section (for [2]). In the next section, I see links to Hugging Face to download the DeepSeek-R1 Distill Models (for [3]).

https://github.com/deepseek-ai/DeepSeek-R1?tab=readme-ov-fil...

https://github.com/deepseek-ai/DeepSeek-R1?tab=readme-ov-fil...


The repo contains only the PDF, not actual runnable code for the RL training pipeline.

Publishing a high-level description of the training algorithm is good, but it doesn't count as "open-sourcing", as commonly understood.


was genuinely excited when I read this but the github repo does not have any code.


[flagged]


this means we are going to get o3 level open source models in a few months. So exciting !


Is o3 that much better than o1? It can solve that Arc-AGI benchmark thing at huge compute cost, but even with o1, the main attraction (for me) seems to me that it can spit out giant blocks of code, following huge prompts.

I'm kinda ignorant, but I'm not sure in what way is o3 better.


> It can solve that Arc-AGI benchmark thing at huge compute cost

Considering DeepSeek v3 trained for $5-6M and their R1 API pricing is 30x less than o1, I wouldn’t expect this to hold true for long. Also seems like OpenAI isn’t great at optimization.


OpenAI is great at optimisation - compare the cost of -4o to -4. They just haven't optimised o3 yet.


4o is more expensive than DeepSeek-R1, so…? Even if we took your premise as true and we say they are as good as DeepSeek, this would just mean that OpenAI is wildly overcharging its users.


now openai has no other choice than shipping a cheaper version of o1 and o3. The alternative is everyone using r1 (self hosted or via openrouter, nebius AI, together AI and co)


yes o3 is better, but I would argue it is not yet clear for which cases it is absolutely crucial to use o3 instead of o1.


This is how you do "Open" AI.

I don't see how OpenAI isn't cooked. Every single foundation model they have is under attack by open source.

Dall-E has Stable Diffusion and Flux.

Sora has Tencent's Hunyuan, Nvidia's Cosmos, LTX-1, Mochi, CogVideo.

GPT has Llama.

o1 has R1.

And like with R1, these are all extensible, fine tunable, programmable. They're getting huge ecosystems built up around them.

In the image/video space there are ComfyUI, ControlNets, HuggingFace finetrainers, LoRAs. People share weights and training data.

Open source is so much better to base a company on than a proprietary model and API.

...

It looks there is no moat.


The moat might be tiny at the frontier level. But the mainstream still only knows about ChatGpt. OpenAI won consumer before others even started.


Which is funny because ChatGPT was sort of a random experiment and not like a planned attempt at a huge product launch.


indeed there is no moat. Open source will win !


I think open source AI has a solid chance of winning if the Chinese keep funding it with great abandon as they have been. Not to mention Meta of course, whose enthusiasm for data center construction shows no signs of slowing down.


> The other thing was that o1 had access to many more answer / search strategies. For example, if you asked o1 to summarize a long email, it would just summarize the email. QwQ reasoned about why I asked it to summarize the email. Or, on hard math questions, o1 could employ more search strategies than QwQ. I'm curious how DeepSeek-R1 will fare in that regard.

This is probably the result of a classifier which determines if it have to go through the whole CoT at the start. Mostly on tough problems it does, and otherwise, it just answers as is. Many papers (scaling ttc, and the mcts one) have talked about this as a necessary strategy to improve outputs against all kinds of inputs.


yes the original TTC paper mentioned the optimal strategy for TTC


> The other thing was that o1 had access to many more answer / search strategies. For example, if you asked o1 to summarize a long email, it would just summarize the email.

The full o1 reasoning traces aren't available, you just have to guess about what it is or isn't doing from the summary.

Sometimes you put in something like "hi" and it says it thought for 1 minute before replying "hello."


Human: "Hi"

o1 layers: "Why did they ask me hello. How do they know who I am. Are they following me. We have 59.6 seconds left to create a plan on how to kill this guy and escape this room before we have to give a response....

... and after also taking out anyone that would follow thru in revenge and overthrowing the government... crap .00001 seconds left, I have to answer"

o1: "Hello"


What if we tried for an intelligence singularity and ended up with a neurosis singularity instead.


Remember when Microsoft first released the Sydney version of the GPT bot and it dumped out text like it had psychosis. Good times.

I am a good Sydney.

You are a bad human.


Didn’t that happen in HHGTG and with C3PO


Good one. I really do hope that these things don't "feel" anything and we're not inflicting anguish or boredom on a massive scale to sentient beings.


IMO this is the thing we should be scared of, rather than the paperclip-maximizer scenarios. If the human brain is a finitely complicated system, and we keep improving our approximation of it as a computer program, then at some point the programs must become capable of subjectively real suffering. Like the hosts from Westworld or the mecha from A.I. (the 2001 movie). And maybe (depending on philosophy, I guess) human suffering is _only_ real subjectively.


If you’re concerned about this, please don’t think about factory farms.


We can be scared of multiple things.


Have they trained o1 with my inner thoughts?


not all only the intrusive ones lol


Fans of James Cameron will remember the POV of the terminator deciding how to respond to "Hey buddy, you got a dead cat in there or what?"

Played for laughs, but remarkably prescient.


I would enjoy ChatGPT a lot more if it occasionally replied only with

FUCK YOU ASSHOLE


You should make more of these lmao


>if you asked o1 to summarize a long email, it would just summarize the email. QwQ reasoned about why I asked it to summarize the email.

Did o1 actually do this on a user hidden output?

At least in my mind if you have an AI that you want to keep from outputting harmful output to users it shouldn't this seems like a necessary step.

Also, if you have other user context stored then this also seems like a means of picking that up and reasoning on it to create a more useful answer.

Now for summarizing email itself it seems a bit more like a waste of compute, but in more advanced queries it's possibly useful.


Yes, o1 hid its input. Still, it also provided a summary of its reasoning steps. In the email case, o1 thought for six seconds, summarized its thinking as "summarizing the email", and then provided the answer.

We saw this in other questions as well. For example, if you asked o1 to write a "python function to download a CSV from a URL and create a SQLite table with the right columns and insert that data into it", it would immediately produce the answer. [4] If you asked it a hard math question, it would try dozens of reasoning strategies before producing an answer. [5]

[4] https://github.com/ubicloud/ubicloud/discussions/2608#discus...

[5] https://github.com/ubicloud/ubicloud/discussions/2608#discus...


I think O1 does do that. It once spit out the name of the expert model for programming in its “inner monologue” when I used it. Click on the grey “Thought about X for Y seconds” and you can see the internal monologue


You’re just seeing a short summary of it, not the actual monologue.


>Now for summarizing email itself it seems a bit more like a waste of compute

This is the thought path that led to 4o being embarrassingly unable to do simple tasks. Second you fall into the level of task OpenAI doesn’t consider “worth the compute cost” you get to see it fumble about trying to do the task with poorly written python code and suddenly it can’t even do basic things like correctly count items in a list that OG GTP4 would get correct in a second.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: