How to keep the accuacy when convert a pytorch model to TensorRT in FP16 mode

I want to use TensorRT FP16 mode to accelerate my pytorch model, but there is a loss of precision. My question is how can I use the mixed precision training of pytorch, to avoid the loss of accuracy when converting to a TensorRT FP16 model. I have tried the torch.cuda.amp.autocast to training the model, but it still lost some precision.

1 Like

Do you see a loss in accuracy comparing plain PyTorch code to mixed-precision training using torch.cuda.amp?
How are you exporting the model to TensorRT?

The mixed-precision training is OK, the loss is introduced by the conversion to the TensorRT fp16 engine.
PyTorch -> ONNX -> TensorRT

How large is the accuracy drop after the conversion to TensorRT?