I know the fp32 and fp16 have different ranges.
How does the PyTorch handle the tensor whose value is outside the fp16 range when casting?
For example, x = torch.Tensor()
If it cast x into inf, does this mean the gradient is Nan, and the training will fail?
Yes, a direct cast to
float16 will overflow and create invalid values. During mixed-precision training with
flaot16 this could happen if the loss scaling factor is too large and the gradients thus overflow.
scaler.step(optimizer) call skips the
optimizer.step() call if invalid gradients are detected and will decrease the scaling factor until the gradients contain valid values again.
Hi @ptrblck, thanks for your reply. Is possible to paste the source code link of the
skip behavior? I want to explore the details.
Yes, you can take a look at
_maybe_opt_step to see the implementation.
Hello, I would like to ask if I use fp16 for the convolution operation during the inference process, so is the result after multiplication and addition fp32 or fp16? If it is fp32, how does pytorch intercept fp16? Is there any specific code for this part that I can refer to? looking forward to your reply.
Take a look at the Automatic Mixed Precision Package to see how PyTorch applies