A handy metric is needed for gauging if GPUs are being used optimally

asaiacai · 2025-05-23T03:06:30 1747969590

MFU is probably the best but requires application logic. You can export metrics at the infra level like SM efficiency. We explain it a bit how we used it to do some optimization.

https://www.trainy.ai/blog/gpu-utilization-misleading

thundergolfer · 2025-05-21T01:03:31 1747789411

MFU is indeed very useful. Today we found that while scaling Karpathy’s nanoGPT to multiple H100 nodes the MFU calculation itself was dropping MFU performance![1]

Commenting it out improved iter performance by almost 30%

1. https://github.com/modal-labs/multinode-training-guide/blob/...