Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The paper's Table 7 shows DyT reducing overall LLaMA 7B inference time by 7.8% and training time by 8.2%. That is not insignificant.


But LLM performance scales according to the log of compute, so yeah it’s pretty insignificant. I think we’ve reached a bit of a plateau.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: