How is the choice between fp16 and fp32 made? Is it like if any gradients in the...

andoma · on May 19, 2022

This article [0] from Nvidia gives a good overview of how mixed precision training works.

Super high level (from section 3):

  1. Converting the model to use the float16 data type where possible.
  2. Keeping float32 master weights to accumulate per-iteration weight updates.
  3. Using loss scaling to preserve small gradient values.

[0] https://docs.nvidia.com/deeplearning/performance/mixed-preci...

h-jones · on May 18, 2022

The PyTorch docs give a pretty good overview of AMP here https://pytorch.org/tutorials/recipes/recipes/amp_recipe.htm... and an overview of which operations cast to which dtype can be found here https://pytorch.org/docs/stable/amp.html#autocast-op-referen....

Edit: Fixed second link.