Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

MFU is indeed very useful. Today we found that while scaling Karpathy’s nanoGPT to multiple H100 nodes the MFU calculation itself was dropping MFU performance![1]

Commenting it out improved iter performance by almost 30%

1. https://github.com/modal-labs/multinode-training-guide/blob/...



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: