MFU is indeed very useful. Today we found that while scaling Karpathy’s nanoGPT ...

		thundergolfer 4 months ago \| parent \| context \| favorite \| on: A handy metric is needed for gauging if GPUs are b... MFU is indeed very useful. Today we found that while scaling Karpathy’s nanoGPT to multiple H100 nodes the MFU calculation itself was dropping MFU performance![1] Commenting it out improved iter performance by almost 30% 1. https://github.com/modal-labs/multinode-training-guide/blob/...