Isn't this what a MoE LLM does already?

sodality2 · on May 22, 2024

MoE-based models are one model with multiple experts. This solution could use entirely different models with different architectures (and probably supports MoE models itself)

danlenton · on May 22, 2024

exactly!

danlenton · on May 22, 2024

MoE LLMs use several "expert" fully connected layers, which are routed to during the forward pass, all trained end-to-end. This approach can also work with black-box LLMs like Opus, GPT4 etc. It's a similar concept but operating at a higher level of abstraction.