How to handle the value outside the fp16 range when casting?

111357 · January 29, 2023, 8:06am

I know the fp32 and fp16 have different ranges.

How does the PyTorch handle the tensor whose value is outside the fp16 range when casting?

For example, x = torch.Tensor([66666])

If it cast x into inf, does this mean the gradient is Nan, and the training will fail?

ptrblck · January 29, 2023, 9:05am

Yes, a direct cast to float16 will overflow and create invalid values. During mixed-precision training with flaot16 this could happen if the loss scaling factor is too large and the gradients thus overflow.
The scaler.step(optimizer) call skips the optimizer.step() call if invalid gradients are detected and will decrease the scaling factor until the gradients contain valid values again.

111357 · January 29, 2023, 9:31am

Hi @ptrblck, thanks for your reply. Is possible to paste the source code link of the skip behavior? I want to explore the details.

ptrblck · January 29, 2023, 10:18am

Yes, you can take a look at GradScaler.step and _maybe_opt_step to see the implementation.

111357 · January 29, 2023, 10:38am

Got it, thanks a lot

hei_Z · September 11, 2023, 3:00am

Hello, I would like to ask if I use fp16 for the convolution operation during the inference process, so is the result after multiplication and addition fp32 or fp16? If it is fp32, how does pytorch intercept fp16? Is there any specific code for this part that I can refer to? looking forward to your reply.

ptrblck · September 11, 2023, 2:22pm

Take a look at the Automatic Mixed Precision Package to see how PyTorch applies amp.