When converting my model to ONNX and then TensorRT, I encountered this issue:
[07/27/2022-23:16:56] [W] [TRT] Weights [name=Conv_13706.weight] had the following issues when converted to FP16:
[07/27/2022-23:16:56] [W] [TRT] - Subnormal FP16 values detected.
[07/27/2022-23:16:56] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/27/2022-23:16:56] [W] [TRT] Weights [name=Conv_13703 + Add_13709 + onnx::Mul_4732_clone_3 + (Unnamed Layer* 7047) [Shuffle] + Mul_13729.weight] had the following issues when converted to FP16:
[07/27/2022-23:16:56] [W] [TRT] - Subnormal FP16 values detected.
[07/27/2022-23:16:56] [W] [TRT] - Values less than smallest positive FP16 Subnormal value detected. Converting to FP16 minimum subnormalized value.
And the results from the FP16 TRT engine is very different from FP32. I tried both TRT 8.4 and 8.2.5, the later ignored all these warnings but the results were the same.
I know this is not strictly a Pytorch issue, but it looks like I can tackle it from the Pytorch side. And I know that I can manually keep some of the layers FP32 to alleviate the issue, but because there are many layers that reported this issue, I don’t want to lose too much speed.
Here are some of the things I tried:
- I printed the values of the weights of each layer, and indeed there are lots of them falling out of the range of FP16. for example, 1e-10
- I tried to clamp the weights of each layer after each iteration during the training to force the values within the subnormal area (5.96e-8 ~ 65504) but the results are still wrong.
- I tried mixed precision training but later I realized it may not help since the weights are still stored in FP32.
It seems like Pytorch doesn’t have a FP16 quantization like Tensorflow does, but is there anything I can do to make the model more FP16 compatible?
Thanks!