Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The existence of R1-zero is evidence against any sort of theft of OpenAI's internal COT data. The model sometimes outputs illegible text that's useful only to R1. You can't do distillation without a shared vocabulary. The only way R1 could exist is if they trained it with RL.


I don’t think anyone is really suggesting they stole COT or that it is leaked, but rather that the final o1 outputs were used to train the base model and reasoning components more easily.


The RL is done on problems with verifiable answers. I’m not sure how o1 slop would be at all useful in that respect.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: