> However, DeepSeek-R1-Zero encounters challenges such as endless repetition, po...

ozgune · 2025-01-20T14:19:21 1737382761

The R1 GitHub repo is way more exciting than I had thought.

They aren't only open sourcing R1 as an advanced reasoning model. They are also introducing a pipeline to "teach" existing models how to reason and align with human preferences. [2] On top of that, they fine-tuned Llama and Qwen models that use this pipeline; and they are also open sourcing the fine-tuned models. [3]

This is *three separate announcements* bundled as one. There's a lot to digest here. Are there any AI practitioners, who could share more about these announcements?

[2] We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models.

[3] Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.

roborovskis · 2025-01-20T15:46:32 1737387992

Where are you seeing this? On https://github.com/deepseek-ai/DeepSeek-R1/tree/main?tab=rea... I only see the paper and related figures.

ozgune · 2025-01-20T16:17:41 1737389861

I see it in the "2. Model Summary" section (for [2]). In the next section, I see links to Hugging Face to download the DeepSeek-R1 Distill Models (for [3]).

https://github.com/deepseek-ai/DeepSeek-R1?tab=readme-ov-fil...

scribu · 2025-01-20T16:27:46 1737390466

The repo contains only the PDF, not actual runnable code for the RL training pipeline.

Publishing a high-level description of the training algorithm is good, but it doesn't count as "open-sourcing", as commonly understood.

fabmilo · 2025-01-20T20:26:44 1737404804

was genuinely excited when I read this but the github repo does not have any code.

fsndz · 2025-01-20T16:32:04 1737390724

[flagged]

fsndz · 2025-01-20T17:35:56 1737394556

this means we are going to get o3 level open source models in a few months. So exciting !

torginus · 2025-01-20T18:01:17 1737396077

Is o3 that much better than o1? It can solve that Arc-AGI benchmark thing at huge compute cost, but even with o1, the main attraction (for me) seems to me that it can spit out giant blocks of code, following huge prompts.

I'm kinda ignorant, but I'm not sure in what way is o3 better.

bugglebeetle · 2025-01-20T18:34:48 1737398088

> It can solve that Arc-AGI benchmark thing at huge compute cost

Considering DeepSeek v3 trained for $5-6M and their R1 API pricing is 30x less than o1, I wouldn’t expect this to hold true for long. Also seems like OpenAI isn’t great at optimization.

Philpax · 2025-01-20T18:51:37 1737399097

OpenAI is great at optimisation - compare the cost of -4o to -4. They just haven't optimised o3 yet.

bugglebeetle · 2025-01-20T19:11:11 1737400271

4o is more expensive than DeepSeek-R1, so…? Even if we took your premise as true and we say they are as good as DeepSeek, this would just mean that OpenAI is wildly overcharging its users.

fsndz · 2025-01-20T21:22:13 1737408133

now openai has no other choice than shipping a cheaper version of o1 and o3. The alternative is everyone using r1 (self hosted or via openrouter, nebius AI, together AI and co)

fsndz · 2025-01-20T19:16:10 1737400570

yes o3 is better, but I would argue it is not yet clear for which cases it is absolutely crucial to use o3 instead of o1.

echelon · 2025-01-20T18:33:18 1737397998

This is how you do "Open" AI.

I don't see how OpenAI isn't cooked. Every single foundation model they have is under attack by open source.

Dall-E has Stable Diffusion and Flux.

Sora has Tencent's Hunyuan, Nvidia's Cosmos, LTX-1, Mochi, CogVideo.

GPT has Llama.

o1 has R1.

And like with R1, these are all extensible, fine tunable, programmable. They're getting huge ecosystems built up around them.

In the image/video space there are ComfyUI, ControlNets, HuggingFace finetrainers, LoRAs. People share weights and training data.

Open source is so much better to base a company on than a proprietary model and API.

...

It looks there is no moat.

parav · 2025-01-20T19:15:26 1737400526

The moat might be tiny at the frontier level. But the mainstream still only knows about ChatGpt. OpenAI won consumer before others even started.

meowface · 2025-01-20T19:29:49 1737401389

Which is funny because ChatGPT was sort of a random experiment and not like a planned attempt at a huge product launch.

fsndz · 2025-01-20T18:35:57 1737398157

indeed there is no moat. Open source will win !

ttul · 2025-01-20T18:52:56 1737399176

I think open source AI has a solid chance of winning if the Chinese keep funding it with great abandon as they have been. Not to mention Meta of course, whose enthusiasm for data center construction shows no signs of slowing down.

ankit219 · 2025-01-20T15:23:24 1737386604

> The other thing was that o1 had access to many more answer / search strategies. For example, if you asked o1 to summarize a long email, it would just summarize the email. QwQ reasoned about why I asked it to summarize the email. Or, on hard math questions, o1 could employ more search strategies than QwQ. I'm curious how DeepSeek-R1 will fare in that regard.

This is probably the result of a classifier which determines if it have to go through the whole CoT at the start. Mostly on tough problems it does, and otherwise, it just answers as is. Many papers (scaling ttc, and the mcts one) have talked about this as a necessary strategy to improve outputs against all kinds of inputs.

picografix · 2025-01-20T17:15:00 1737393300

yes the original TTC paper mentioned the optimal strategy for TTC

cma · 2025-01-20T14:41:16 1737384076

> The other thing was that o1 had access to many more answer / search strategies. For example, if you asked o1 to summarize a long email, it would just summarize the email.

The full o1 reasoning traces aren't available, you just have to guess about what it is or isn't doing from the summary.

Sometimes you put in something like "hi" and it says it thought for 1 minute before replying "hello."

pixl97 · 2025-01-20T14:48:19 1737384499

Human: "Hi"

o1 layers: "Why did they ask me hello. How do they know who I am. Are they following me. We have 59.6 seconds left to create a plan on how to kill this guy and escape this room before we have to give a response....

... and after also taking out anyone that would follow thru in revenge and overthrowing the government... crap .00001 seconds left, I have to answer"

o1: "Hello"

svnt · 2025-01-20T18:18:17 1737397097

What if we tried for an intelligence singularity and ended up with a neurosis singularity instead.

pixl97 · 2025-01-20T20:56:31 1737406591

Remember when Microsoft first released the Sydney version of the GPT bot and it dumped out text like it had psychosis. Good times.

I am a good Sydney.

You are a bad human.

kridsdale1 · 2025-01-21T08:45:33 1737449133

Didn’t that happen in HHGTG and with C3PO

throw310822 · 2025-01-20T21:36:18 1737408978

Good one. I really do hope that these things don't "feel" anything and we're not inflicting anguish or boredom on a massive scale to sentient beings.

desertrider12 · 2025-01-21T07:07:32 1737443252

IMO this is the thing we should be scared of, rather than the paperclip-maximizer scenarios. If the human brain is a finitely complicated system, and we keep improving our approximation of it as a computer program, then at some point the programs must become capable of subjectively real suffering. Like the hosts from Westworld or the mecha from A.I. (the 2001 movie). And maybe (depending on philosophy, I guess) human suffering is _only_ real subjectively.

kridsdale1 · 2025-01-21T08:46:47 1737449207

If you’re concerned about this, please don’t think about factory farms.

FeepingCreature · 2025-01-21T15:36:54 1737473814

We can be scared of multiple things.

DHRicoF · 2025-01-20T15:52:07 1737388327

Have they trained o1 with my inner thoughts?

gunian · 2025-01-20T16:01:45 1737388905

not all only the intrusive ones lol

loudmax · 2025-01-20T17:48:45 1737395325

Fans of James Cameron will remember the POV of the terminator deciding how to respond to "Hey buddy, you got a dead cat in there or what?"

Played for laughs, but remarkably prescient.

kridsdale1 · 2025-01-21T08:47:58 1737449278

I would enjoy ChatGPT a lot more if it occasionally replied only with

FUCK YOU ASSHOLE

iamronaldo · 2025-01-20T15:26:52 1737386812

You should make more of these lmao

pixl97 · 2025-01-20T14:44:16 1737384256

>if you asked o1 to summarize a long email, it would just summarize the email. QwQ reasoned about why I asked it to summarize the email.

Did o1 actually do this on a user hidden output?

At least in my mind if you have an AI that you want to keep from outputting harmful output to users it shouldn't this seems like a necessary step.

Also, if you have other user context stored then this also seems like a means of picking that up and reasoning on it to create a more useful answer.

Now for summarizing email itself it seems a bit more like a waste of compute, but in more advanced queries it's possibly useful.

ozgune · 2025-01-20T15:05:18 1737385518

Yes, o1 hid its input. Still, it also provided a summary of its reasoning steps. In the email case, o1 thought for six seconds, summarized its thinking as "summarizing the email", and then provided the answer.

We saw this in other questions as well. For example, if you asked o1 to write a "python function to download a CSV from a URL and create a SQLite table with the right columns and insert that data into it", it would immediately produce the answer. [4] If you asked it a hard math question, it would try dozens of reasoning strategies before producing an answer. [5]

[4] https://github.com/ubicloud/ubicloud/discussions/2608#discus...

[5] https://github.com/ubicloud/ubicloud/discussions/2608#discus...

coffeebeqn · 2025-01-20T15:00:51 1737385251

I think O1 does do that. It once spit out the name of the expert model for programming in its “inner monologue” when I used it. Click on the grey “Thought about X for Y seconds” and you can see the internal monologue

Me1000 · 2025-01-20T16:07:23 1737389243

You’re just seeing a short summary of it, not the actual monologue.

whywhywhywhy · 2025-01-21T09:39:32 1737452372

>Now for summarizing email itself it seems a bit more like a waste of compute

This is the thought path that led to 4o being embarrassingly unable to do simple tasks. Second you fall into the level of task OpenAI doesn’t consider “worth the compute cost” you get to see it fumble about trying to do the task with poorly written python code and suddenly it can’t even do basic things like correctly count items in a list that OG GTP4 would get correct in a second.