Subnormal FP16 values detected when converting to TRT

When converting my model to ONNX and then TensorRT, I encountered this issue:

[07/27/2022-23:16:56] [W] [TRT] Weights [name=Conv_13706.weight] had the following issues when converted to FP16:
[07/27/2022-23:16:56] [W] [TRT] - Subnormal FP16 values detected.
[07/27/2022-23:16:56] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/27/2022-23:16:56] [W] [TRT] Weights [name=Conv_13703 + Add_13709 + onnx::Mul_4732_clone_3 + (Unnamed Layer* 7047) [Shuffle] + Mul_13729.weight] had the following issues when converted to FP16:
[07/27/2022-23:16:56] [W] [TRT] - Subnormal FP16 values detected.
[07/27/2022-23:16:56] [W] [TRT] - Values less than smallest positive FP16 Subnormal value detected. Converting to FP16 minimum subnormalized value.

And the results from the FP16 TRT engine is very different from FP32. I tried both TRT 8.4 and 8.2.5, the later ignored all these warnings but the results were the same.

I know this is not strictly a Pytorch issue, but it looks like I can tackle it from the Pytorch side. And I know that I can manually keep some of the layers FP32 to alleviate the issue, but because there are many layers that reported this issue, I don’t want to lose too much speed.

Here are some of the things I tried:

  1. I printed the values of the weights of each layer, and indeed there are lots of them falling out of the range of FP16. for example, 1e-10
  2. I tried to clamp the weights of each layer after each iteration during the training to force the values within the subnormal area (5.96e-8 ~ 65504) but the results are still wrong.
  3. I tried mixed precision training but later I realized it may not help since the weights are still stored in FP32.

It seems like Pytorch doesn’t have a FP16 quantization like Tensorflow does, but is there anything I can do to make the model more FP16 compatible?

Thanks!

It turns out that even if I clamp the weights to FP16 normal range (6.1e-5), as long as the model is still running under FP32 precision, the output is still good.

This means the difference comes from the poor casting from FP32 to FP16. So, is there any possibility to do FP16 aware training or calibration in Pytorch?

The only thing you can do is protecting some part of your graph by casting to fp32. Because here that’s the weights of the model are the issue, it means that some of those weights should not be converted in FP16. It requires a manual FP16 conversion…

Thank you for your reply! Here is the thing, it seems to me, according to my search results and answers from peers, that we are more hand-tied to FP16 than INT8. We have quantization for INT8 when conversion is poor, but for FP16, the only suggestion I’m getting is to keep the layers untouched. Is it because of some specific layers that have to be run in FP32 (if so, could you please give some examples), so basically it is not a solvable problem? Or is it because FP16 conversions were so conventionally easy and functioning, nobody was digging into this kind of corner case before?

Thanks!