> AIME is saturated (with tool use) [...] But isn't tool use kinda the crux here... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Topfi 41 days ago \| parent \| context \| favorite \| on: OpenAI researcher announced GPT-5 math breakthroug... > AIME is saturated (with tool use) [...] But isn't tool use kinda the crux here? Correct me if I'm mistaken, but wasn't the argument back then on whether LLMs could solve maths problems without e.g. writing python to solve? Cause when "Sparks of AGI" came out in March, prompting gpt-3.5-turbo to code solutions to assist solving maths problems over just solving them directly was already established and seemed like the path forward. Heck, it is still the way to go, despite major advancements. Given that, was he truly mistaken on his assertions regarding LLMs solving maths? Same for "planning".

NitpickLawyer 41 days ago [–]

AIME was saturated with tool use (i.e. 99%) for SotA models, but pure NL, no tool still perform "unreasonably well" on the task. Not 100% but still within 90%. And with lots of compute it can reach 99% as well, apparently [1] (@512 rollouts, but still)

[1] - https://arxiv.org/pdf/2508.15260

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact