amp.GradScaler() and amp.autocast() question

What is amp.GradScaler() and amp.autocast() good for ?

When I use them in my training loop I don’t see any big difference. Can someone tell me when I should use these two ? Or how it should help.

The docs on automatic mixed precision are explaining both objects and their usage.
TL;DR:

  • autocast will cast the data to float16 (or bfloat16 if specified) where possible to speed up your model and use TensorCores if available on your GPU
  • GradScaler will prevent underflowing of small gradients, which could otherwise break your training