Thanks a lot for this video, best LLM usage tutorial I've seen so far. At https:...

germanjoey · on Sept 25, 2023

Sambanova just launched something similar to what you're describing. It's a demo of their new chip running a 1T param MoE model 150 7B llama2s, each retrained to be an expert in a different topic. So one of them is a "law" expert, another on "physics", etc.

They've got a video here [1] (scroll down slightly) that compares it against a 180B Falcon model that's running on GPUs on HuggingFace. The MoE results are not only just as good quality-wise, but also ridiculously fast. Like, nearly instant. A big benefit is that the experts can be swapped-out and retrained with new data, which is obviously not as easy with the more monolithic 180B model.

[1] https://sambanova.ai/launch2023

tarruda · on Sept 25, 2023

Really impressive, thanks for sharing

jph00 · on Sept 24, 2023

It makes a lot of sense! In fact there's a number of open source projects working on just such a model right now. Here's a great example: https://github.com/XueFuzhao/OpenMoE/

tarruda · on Sept 25, 2023

Awesome, thanks for sharing.

Guillaume86 · on Sept 24, 2023

Search "mixture of experts" on Google, some unverified leaks say GPT4 is using it already.

tarruda · on Sept 25, 2023

Can you share where you read about these leaks?

brandall10 · on Sept 25, 2023

George Hotz was the first to mention on Lex Fridman’s podcast, and there have been several others close to OpenAI who confirmed.

isaacfung · on Sept 25, 2023

It was all over hackernews. Just google "hacker news gpt4 mixture of experts".