Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Isn't this what a MoE LLM does already?


MoE-based models are one model with multiple experts. This solution could use entirely different models with different architectures (and probably supports MoE models itself)


exactly!


MoE LLMs use several "expert" fully connected layers, which are routed to during the forward pass, all trained end-to-end. This approach can also work with black-box LLMs like Opus, GPT4 etc. It's a similar concept but operating at a higher level of abstraction.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: