Just want to say nice job and keep it up. Thrilled to start playing with 3.7.
In general, benchmarks seem to very misleading in my experience, and I still prefer sonnet 3.5 for _nearly_ every use case- except massive text tasks, which I use gemini 2.0 pro with the 2M token context window.
I find the webdev arena tends to match my experience with models much more closely than other benchmarks: https://web.lmarena.ai/leaderboard. Excited to see how 3.7 performs!
We find that Claude is really good at test driven development, so we often ask Claude to write tests first and then ask Claude to iterate against the tests
Time to actually read Test-Driven Development By Example, my friend. Or if you can't stomach reading a whole book, read this: https://tidyfirst.substack.com/p/canon-tdd
TL;DR - If you're writing more than one failing test at a time, you are not doing Test-Driven Development.
Getting things done require a lot of booksmarts, but also a lot of "street smarts" - knowing when to answer quickly, when to double back, etc