Yes, a direct cast to float16 will overflow and create invalid values. During mixed-precision training with flaot16 this could happen if the loss scaling factor is too large and the gradients thus overflow.
The scaler.step(optimizer) call skips the optimizer.step() call if invalid gradients are detected and will decrease the scaling factor until the gradients contain valid values again.
Hello, I would like to ask if I use fp16 for the convolution operation during the inference process, so is the result after multiplication and addition fp32 or fp16? If it is fp32, how does pytorch intercept fp16? Is there any specific code for this part that I can refer to? looking forward to your reply.