Thanks a lot for this video, best LLM usage tutorial I've seen so far.
At https://youtu.be/jkrNMKz9pWU?si=Dvz-Hs4InJXNozhi&t=3278 when talking about valid use cases for a local model vs GPT4 is: "You might want to create your own model that's particularly good at solving the kinds of problems that you need to solve using fine tuning, and these are all things that you absolutely can get better than GPT4 performance".
In regards to this, there's an idea I've been thinking about for some time: Imagine a chatbot that is backed by multiple "small" models (such as 7B parameters), where each model is fine tuned for a specific task. Could such a system outperform GPT4?
Here's a high level overview how I imagine this to work:
- Context/prompt is sent to a "router model", which is trained to determine what kind of expert model can best answer/complete the prompt.
- The system then passes the context/prompt to the expert model and returns that answer.
- If no expert model is found, just use a generic instruct tuned general purpose LLM to answer
If you can theoretically get better than GPT4 performance on a small models fine tuned for that task, maybe a cluster of such small models could collectively outperform GPT4.
Sambanova just launched something similar to what you're describing. It's a demo of their new chip running a 1T param MoE model 150 7B llama2s, each retrained to be an expert in a different topic. So one of them is a "law" expert, another on "physics", etc.
They've got a video here [1] (scroll down slightly) that compares it against a 180B Falcon model that's running on GPUs on HuggingFace. The MoE results are not only just as good quality-wise, but also ridiculously fast. Like, nearly instant. A big benefit is that the experts can be swapped-out and retrained with new data, which is obviously not as easy with the more monolithic 180B model.
It makes a lot of sense! In fact there's a number of open source projects working on just such a model right now. Here's a great example: https://github.com/XueFuzhao/OpenMoE/
At https://youtu.be/jkrNMKz9pWU?si=Dvz-Hs4InJXNozhi&t=3278 when talking about valid use cases for a local model vs GPT4 is: "You might want to create your own model that's particularly good at solving the kinds of problems that you need to solve using fine tuning, and these are all things that you absolutely can get better than GPT4 performance".
In regards to this, there's an idea I've been thinking about for some time: Imagine a chatbot that is backed by multiple "small" models (such as 7B parameters), where each model is fine tuned for a specific task. Could such a system outperform GPT4?
Here's a high level overview how I imagine this to work:
- Context/prompt is sent to a "router model", which is trained to determine what kind of expert model can best answer/complete the prompt.
- The system then passes the context/prompt to the expert model and returns that answer.
- If no expert model is found, just use a generic instruct tuned general purpose LLM to answer
If you can theoretically get better than GPT4 performance on a small models fine tuned for that task, maybe a cluster of such small models could collectively outperform GPT4.
Does that make sense?