amp.GradScaler() and amp.autocast() question

Samuel_Bachorik · March 30, 2022, 3:39pm

What is amp.GradScaler() and amp.autocast() good for ?

When I use them in my training loop I don’t see any big difference. Can someone tell me when I should use these two ? Or how it should help.

ptrblck · March 31, 2022, 5:46am

The docs on automatic mixed precision are explaining both objects and their usage.
TL;DR:

autocast will cast the data to float16 (or bfloat16 if specified) where possible to speed up your model and use TensorCores if available on your GPU
GradScaler will prevent underflowing of small gradients, which could otherwise break your training