Can an int8 model derived from pytorch's QAT training be converted directly to tensorRT?

lishanlu136 · April 12, 2024, 12:44pm

Can an int8 model derived from pytorch’s QAT training be converted directly to tensorRT? Because the int8 model trained by QAT failed to convert onnx, I want to try to convert directly to tensorRT for GPU inference.

jcaip · April 15, 2024, 3:49pm

Hi @lishanlu136 , have you tried the steps outlined in this tutorial: Deploying Quantization Aware Trained models in INT8 using Torch-TensorRT — Torch-TensorRT v1.3.0 documentation

lishanlu136 · April 18, 2024, 10:04am

Thank you, I read the documentation carefully, is this tutorial using the library provided by TensorRT to QAT the pytorch model?

HDCharles · April 26, 2024, 6:41pm

I think you can try using something along the lines of the following:

github.com

pytorch/TensorRT/blob/main/py/torch_tensorrt/fx/test/quant/test_quant_trt.py#L429-L444


      
          prepared = prepare(
              m,
              {"": self.trt_qconfig},
              example_inputs,
              backend_config=self.trt_backend_config_dict,
          )
          self.checkGraphModuleNodes(prepared, expected_node_occurrence=no_prepare)
          # calibration
          prepared(*inputs)
          quantized = convert_to_reference_fx(
              prepared,
              backend_config=self.trt_backend_config_dict,
          )
          self.checkGraphModuleNodes(quantized, expected_node_occurrence=no_convert)
          # lower to trt
          trt_mod = lower_to_trt(quantized, inputs, shape_ranges)

its still in early prototype phase i believe but theoretically that should work if your model is traceable