Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm hoping to see some smaller MoE models released this year, trained with more recent recipes (higher quality data, much longer pretraining). Mixtral 8x7B was impressive when it came out, but the exact same architecture could be a lot more powerful today, and would run quite fast on Apple Silicon.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: