GPT-4.5 Preview scored 45% on aider's polyglot coding benchmark [0]. OpenAI desc...

doctoboggan · 2025-02-27T21:45:37 1740692737

I was waiting for your comment and wow... that's bad.

I guess they are ceding the LLMs for coding market to Anthropic? I remember seeing an industry report somewhere and it claimed software development is the largest user of LLMs, so it seems weird to give up in this area.

Workaccount2 · 2025-02-27T23:38:21 1740699501

4.5 lies on a different path than their STEM models.

o3-mini is an extremely powerful coding model and unquestionably is in the same league as 3.7. o3 is still the top stem overall model.

nwienert · 2025-02-28T05:47:23 1740721643

No way, I've found o3 mini to be terrible. It' not as good as R1/Sonnet 3.5.

I_am_tiberius · 2025-02-27T22:21:11 1740694871

I assume they go all in "the new google" direction. Embedded ads coming soon I guess in the free version (chat.com).