Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You don't have to use the whole MoE model, for each token only 1/N of the model is used, where N is the number of experts. So it's compute utilisation scales slower than memory usage.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: