Correct me if I'm mistaken, but wasn't the argument back then on whether LLMs could solve maths problems without e.g. writing python to solve? Cause when "Sparks of AGI" came out in March, prompting gpt-3.5-turbo to code solutions to assist solving maths problems over just solving them directly was already established and seemed like the path forward. Heck, it is still the way to go, despite major advancements.
Given that, was he truly mistaken on his assertions regarding LLMs solving maths? Same for "planning".
AIME was saturated with tool use (i.e. 99%) for SotA models, but pure NL, no tool still perform "unreasonably well" on the task. Not 100% but still within 90%. And with lots of compute it can reach 99% as well, apparently [1] (@512 rollouts, but still)
But isn't tool use kinda the crux here?
Correct me if I'm mistaken, but wasn't the argument back then on whether LLMs could solve maths problems without e.g. writing python to solve? Cause when "Sparks of AGI" came out in March, prompting gpt-3.5-turbo to code solutions to assist solving maths problems over just solving them directly was already established and seemed like the path forward. Heck, it is still the way to go, despite major advancements.
Given that, was he truly mistaken on his assertions regarding LLMs solving maths? Same for "planning".