There are literally public ChatGPT conversations data sets. For the past 2 years it's been common practice for pretty much all open source models to train on them. Ask just about any open source model who they are and a lot of the time they'll say they're ChatGPT. Why is "having obtained o1 generated data" suddenly such a huge news, to the point of warranting conspiracy theories about undisclosed/undiscovered breaches at OpenAI? Nobody ever made a fuss about public ChatGPT data sets until now. No hacking of OpenAI is needed to obtain ChatGPT data.