This article [0] from Nvidia gives a good overview of how mixed precision training works.
Super high level (from section 3):
1. Converting the model to use the float16 data type where possible.
2. Keeping float32 master weights to accumulate per-iteration weight updates.
3. Using loss scaling to preserve small gradient values.