That said, while I've really enjoyed the LLM abstraction (making it easy for me to test different models without changing my code), I haven't felt any desire for a router. I _do_ have some prompts that I send to gpt-3.5-turbo, and could potentially use other models, but it's kind of niche.
In part this is because I try to do as much in a single prompt as I can, meaning I want to use a model that's able to handle the hardest parts of the prompt and then the easy parts come along with. As a result there's not many "easy" prompts. The easy prompts are usually text fixup and routing.
My "routing" prompts are at a different level of abstraction, usually routing some input or activity to one of several prompts (each of which has its own context, and the sum of all contexts across those prompts is too large, hence the routing). I don't know if there's some meaningful crossover between these two routing concepts.
Another issue I have with LLM portability is the use of tools/functions/structured output. Opus and Gemini Pro 1.5 have kind of implemented this OK, but until recently GPT was the only halfway decent implementation of this. This seems to be an "advanced" feature, yet it's also a feature I use even more with smaller prompts, as those small prompts are often inside some larger algorithm and I don't want the fuss of text parsing and exceptions from ad hoc output.
But in the end I'm not price sensitive in my work, so I always come back to the newest GPT model. If I make a switch to Opus it definitely won't be to save money! And I'm probably not going to want to fiddle, but instead make a thoughtful choice and switch the default model in my code.
Although, to your point, we have seen less market pull for routing, and more for (a) supporting the latest LLMs, (b) basic translation (e.g. tool call API b/w Anthropic & OpenAI), and (c) solid infra features like caching/load balancing api keys/secret management. So that's our focus.
Super helpful feedback, thanks for going so deep! I agree that for the really heavy agentic stuff, the router in it's current form might not be the most important innovation.
However, for several use cases speed is really paramount, and can directly hinder the UX. Examples include sales call agents, copilots, auto-complete engines etc. These are some of the areas where we've seen the router really shine, diverting to slow models when absolutely necesary on complex prompts, but using fast models as often as possible to minimize disruption to the UX.
Having said that, another major benefit of the platform is the ability to quickly run objective benchmarks for quality, cost and speed across all models and providers, on your own prompts [https://youtu.be/PO4r6ek8U6M]. We have some users who run benchmarks regularly for different checkpoints of their fine-tuned model, comparing against all other custom fine-tuned models, as well as the various foundation models.
As for the overlap in routing concepts you mentioned, I've thought a lot about this actually. It's our intention to broaden the kinds of routing we're able to handle, where we assume all control flow decision (routing) and intermediate prompts are latent variables (DSPy perspective). In the immediate future there is not crossover though.
I agree cost is often an afterthought. Generally our users either care about improving speed, or they want to know which model or combination of models would be best for their task in terms of output quality (GPT-4, Opus, Gemini? etc.). This is not trivial to guage without performing benchmarks.
As for usually wanting to make a full LLM switch as opposed to routing, what's the primary motivation? Avoiding extra complexity + dependencies in the stack? Perhaps worrying about model-specific prompts no longer working well with a new model? The general loss of control?
That said, while I've really enjoyed the LLM abstraction (making it easy for me to test different models without changing my code), I haven't felt any desire for a router. I _do_ have some prompts that I send to gpt-3.5-turbo, and could potentially use other models, but it's kind of niche.
In part this is because I try to do as much in a single prompt as I can, meaning I want to use a model that's able to handle the hardest parts of the prompt and then the easy parts come along with. As a result there's not many "easy" prompts. The easy prompts are usually text fixup and routing.
My "routing" prompts are at a different level of abstraction, usually routing some input or activity to one of several prompts (each of which has its own context, and the sum of all contexts across those prompts is too large, hence the routing). I don't know if there's some meaningful crossover between these two routing concepts.
Another issue I have with LLM portability is the use of tools/functions/structured output. Opus and Gemini Pro 1.5 have kind of implemented this OK, but until recently GPT was the only halfway decent implementation of this. This seems to be an "advanced" feature, yet it's also a feature I use even more with smaller prompts, as those small prompts are often inside some larger algorithm and I don't want the fuss of text parsing and exceptions from ad hoc output.
But in the end I'm not price sensitive in my work, so I always come back to the newest GPT model. If I make a switch to Opus it definitely won't be to save money! And I'm probably not going to want to fiddle, but instead make a thoughtful choice and switch the default model in my code.