xuehy
(Guessit)
July 21, 2023, 5:58am
1
I ran quantized aware training in pytorch and convert the model into quantized with torch.ao.quantization.convert
. I know pytorch does not yet support the inference of the quantized model on GPU, however, is there a way to convert the quantized pytorch model into tensorrt?
I tried torch-tensorrt
following the guide on pytorch/TensorRT: PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT (github.com) . However, the conversion failed with following errors:
HDCharles
(Hd Charles)
July 21, 2023, 10:02pm
2
I don’t think you can just take an eager mode quantized model and lower it to trt, trt also has its own quantization stuff (https://github.com/pytorch/TensorRT/tree/main/examples/int8/ptq ) that might work, but the native pytorch to trt lowering is still an early prototype atm (Quantization — PyTorch main documentation ) maybe @jerryzh168 if there’s an easy way to do this atm?
HDCharles
(Hd Charles)
July 21, 2023, 10:39pm
3
see the example here:
prepared = prepare(
m,
{"": self.trt_qconfig},
example_inputs,
backend_config=self.trt_backend_config_dict,
)
self.checkGraphModuleNodes(prepared, expected_node_occurrence=no_prepare)
# calibration
prepared(*inputs)
quantized = convert_to_reference_fx(
prepared,
backend_config=self.trt_backend_config_dict,
)
self.checkGraphModuleNodes(quantized, expected_node_occurrence=no_convert)
# lower to trt
trt_mod = lower_to_trt(quantized, inputs, shape_ranges)
youu can try that, though it uses fx quantization, not eager mode.
xuehy
(Guessit)
July 24, 2023, 2:02am
4
I already used fx quantization, but the conversion still failed. Is it because the pytorch quanization functionality is at early stage? Will the conversion of pytorch quantized model to tensorrt be easier in the future? Do you have any recommended tools for pytorch quantization and tensorrt distribution now?
HDCharles
(Hd Charles)
July 24, 2023, 4:06pm
5
do those tests I linked pass for your setup?
if they do then there’s something about your model thats an issue, if not then its something in your setup.
if its the former, if you provide a repro and we can take a deeper look. If its the latter, you’d probably need to ask the TensorRT folks.
xuehy
(Guessit)
July 25, 2023, 6:01am
6
The test failed.
It seems that the instancenorm
module of pytorch contains conditional sentences if input.dim() not in (3, 4)
that cannot be traced. Is it a bug of pytorch itself?
xuehy
(Guessit)
July 26, 2023, 5:48am
8
Thanks for those who responded to my question.
I gave up using pytorch’s own quantization stuff.
I finally successfully quantized my model and converted it into onnx and then tensorrt with package pytorch-quantization · PyPI , onnx
and NVIDIA/TensorRT: NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications. (github.com)
Hopefully quantization and its deployment can be easier within pytorch in the future.
1 Like