Can an int8 model derived from pytorch’s QAT training be converted directly to tensorRT? Because the int8 model trained by QAT failed to convert onnx, I want to try to convert directly to tensorRT for GPU inference.
Hi @lishanlu136 , have you tried the steps outlined in this tutorial: Deploying Quantization Aware Trained models in INT8 using Torch-TensorRT — Torch-TensorRT v1.3.0 documentation
Thank you, I read the documentation carefully, is this tutorial using the library provided by TensorRT to QAT the pytorch model?
I think you can try using something along the lines of the following:
its still in early prototype phase i believe but theoretically that should work if your model is traceable