The paper's Table 7 shows DyT reducing overall LLaMA 7B inference time by 7.8% a... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		atgctg 9 months ago \| parent \| context \| favorite \| on: Transformers Without Normalization The paper's Table 7 shows DyT reducing overall LLaMA 7B inference time by 7.8% and training time by 8.2%. That is not insignificant.

Herring 9 months ago [–]

But LLM performance scales according to the log of compute, so yeah it’s pretty insignificant. I think we’ve reached a bit of a plateau.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact