Someone who has built infrastructure or system prompts that use Opus will probably continue with Opus until they verify that everything works on Sonnet 3.5
Benchmarks don't cover all possible use cases, for one. There's always the possibility that a model does better on every benchmark thrown at it, but for your specific use case it does worse in practice.