I’m not going to be defaulting to other providers for new tasks - just putting a...

maeil · 2024-08-09T05:04:30 1723179870

We mostly do multimodal tasks (vision + text), and there the differences between flagship models are still much bigger. For us, the benchmarks showing all of them being close are pretty meaningless, it really depends on the task when vision is involved.

Our pure text tasks are generally quite simple, so for price+speed reasons those don't use Sonnet but instead Llama 3.0, very-old-version 3.5 Turbo (newer versions are awful) or 4o-mini.