How to handle the value outside the fp16 range when casting?

I know the fp32 and fp16 have different ranges.

How does the PyTorch handle the tensor whose value is outside the fp16 range when casting?

For example, x = torch.Tensor([66666])

If it cast x into inf, does this mean the gradient is Nan, and the training will fail?

Yes, a direct cast to float16 will overflow and create invalid values. During mixed-precision training with flaot16 this could happen if the loss scaling factor is too large and the gradients thus overflow.
The scaler.step(optimizer) call skips the optimizer.step() call if invalid gradients are detected and will decrease the scaling factor until the gradients contain valid values again.

1 Like

Hi @ptrblck, thanks for your reply. Is possible to paste the source code link of the skip behavior? I want to explore the details. :heart:

Yes, you can take a look at GradScaler.step and _maybe_opt_step to see the implementation.

1 Like

Got it, thanks a lot :blush: