If true, the question is: did they use ChatGPT outputs to create Deepseek V3 only, or is the R1-zero training process a complete lie (given that the whole premise is that they used pure reinforcement learning)? If they only used ChatGPT output when training V3, then they succeeded in basically replicating the jump from ChatGPT-4o to o1 without any human-labeled CoT (and published the results) - which is a big achievement on its own.