Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This is because the entire model's weights must be run through the algorithm many times per prompt.

And this is why I'm so excited about MoE models! qwen3:30b-a3b runs at the speed of a 3B parameter model. It's completely realistic to run on a plain CPU with 20 GB RAM for the model.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: