> We used novel synthetic data generation techniques, such as distilling outputs...

> We used novel synthetic data generation techniques, such as distilling outputs from OpenAI o1-preview, to post-train the model for its core behaviors. This approach allowed us to rapidly address writing quality and new user interactions, all without relying on human-generated data.

So they took a bunch of human-generated data and put it into o1, then used the output of o1 to train canvas? How can they claim that this is a completely synthetic dataset? Humans were still involved in providing data.