Ten hours and counting! That's made a lovely mess of my day. Fell back to the Mistral API for some things, but it's (a) much slower, (b) not as good. Reminder to self: have fail-over in place already, rather than having to whip it up on the fly.
Same, broke product. Issue is that there is an (albeit small) subset of tasks where Claude is first in class and can't be replaced by any other model without a decrease in quality. Even then, taking away the same lesson here, and leading me to unfortunately consider defaulting to other providers for new tasks, despite generally preferring Claude.
I’m not going to be defaulting to other providers for new tasks - just putting a fail-over in place.
Out of interest what small set of tasks do you find Claude to be best for? Because I find it to be significantly better for most things. The only thing I have found it not better at for my use cases is identifying specific pieces of (somewhat specialist) machinery and equipment from images, where I’m still getting stronger results via OpenAI.
We mostly do multimodal tasks (vision + text), and there the differences between flagship models are still much bigger. For us, the benchmarks showing all of them being close are pretty meaningless, it really depends on the task when vision is involved.
Our pure text tasks are generally quite simple, so for price+speed reasons those don't use Sonnet but instead Llama 3.0, very-old-version 3.5 Turbo (newer versions are awful) or 4o-mini.
Sorry for the hassle Sam. We're seeing decreasing error rates on the API now. This was painful -- appreciate your patience while we work through the issue with one of our infra providers.
I bet your day was more filled with hassle! But genuinely appreciate the response. One thing it’s really highlighted to me is that you’re miles ahead of the alternative APIs available. It’s also highlighted quite how much I’ve come to rely on not just the API but other little workflows I’ve made in Claude itself. I felt quite bereft for most of the morning.