I'll be trying out Sonnet 3.7 extended thinking + Sonnet 3.5 or Flash 2.0, which I assume would be at the top of the leaderboard.
I'd like to see that benchmark, but R1 + 3.7 should be cheaper than 3.7T + 3.7
Flash 2.0 got 100% on the edit format, and it's extremely cheap, so I'm pretty curious how that would score.
I'll be trying out Sonnet 3.7 extended thinking + Sonnet 3.5 or Flash 2.0, which I assume would be at the top of the leaderboard.