To put that in context, Claude 3.5 Sonnet (new), a model we have had for months now and which from all accounts seems to have been cheaper to train and is cheaper to use, is still ahead of GPT-4.5 at 36.1% vs 32.6% in SWE-Lancer Diamond [0]. The more I look into this release, the more confused I get.