Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The performance advantage comes from doing 1/32 of the floating point operations compared to a dense layer with the same number of parameters.


The performance comes mostly from a fraction of memory bandwidth needed, as LLM are mostly memory constrained. Compute matters too, but usually far less than memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: