It seems to be the way with these releases, sticking with Claude, at least for the 'hard' tasks. In my agent platform I have LLMs assigned for easy/medium/hard categorised tasks, which was somewhat inspired from the Claude 3 release with Haiku/Sonnet/Opus. GPT4-mini has bumped Haiku for the easy category for now. Sonnet 3.5 bumped Opus for the hard category, so I could possibly downgrade the medium tasks from Sonnet 3.5 to Mistral Large 2 if the price is right on the platforms with only 123b params compared to 405b. I was surprised how much Llama3 405b was on together.ai $5/mil for input/output! I'll stick to Sonnet 3.5. Then I was also surprised how much cheaper Fireworks was at $3/mil
Gemini has two aces up its sleeve now with the long context and now the context caching for 75% reduced input token cost. I was looking at the "Improved Facuality and Reasoning in Language Models through Multi-agent debate" paper the other days, and thought Gemini would have a big cost advantage implementing this technique with the context caching. If only Google could get their model up to the level of Anthropic.
Gemini has two aces up its sleeve now with the long context and now the context caching for 75% reduced input token cost. I was looking at the "Improved Facuality and Reasoning in Language Models through Multi-agent debate" paper the other days, and thought Gemini would have a big cost advantage implementing this technique with the context caching. If only Google could get their model up to the level of Anthropic.