What is amp.GradScaler() and amp.autocast() good for ?
When I use them in my training loop I don’t see any big difference. Can someone tell me when I should use these two ? Or how it should help.
What is amp.GradScaler() and amp.autocast() good for ?
When I use them in my training loop I don’t see any big difference. Can someone tell me when I should use these two ? Or how it should help.
The docs on automatic mixed precision are explaining both objects and their usage.
TL;DR:
autocast
will cast the data to float16
(or bfloat16
if specified) where possible to speed up your model and use TensorCores if available on your GPUGradScaler
will prevent underflowing of small gradients, which could otherwise break your training