Isn’t reasoning, aka test-time compute, ultimately just another form of scaling? Yes it happens at a different stage, but the equation is still 'scale total compute > more intelligence'. In that sense, combining their biggest pre-trained models with their best reasoning strategies from RL could be the most impactful scaling lever available to them at the moment.