DeepSeek-R1's multi-step bootstrapping process, starting with their DeepSeek-V3 ...

DeepSeek-R1's multi-step bootstrapping process, starting with their DeepSeek-V3 base model, would only seem to need a small amount of reasoning data for the DeepSeek-R0 RL training, after which that becomes the source for further data, along with some other sources that they mention.

Of course it's possible that DeepSeek used O1 to generate some of this initial bootstrapping data, but not obvious. O1 anyways deliberately obfuscates it's reasoning process (see "Hiding the chains of thought" section of OpenAI's "Learning to reason with LLMs" page), such that what you see is an after-the-fact "summary" of what it actually did; so, if DeepSeek did indeed use some of O1's output to train on, it shows that the details of O1's own reasoning process isn't as important as they thought it was - it's just having some verified (i.e. leading to good outcome) reasoning data from any source that matters to get started.