[quantization] Frequently Asked Questions

jerryzh168 · September 13, 2022, 1:07am

Examples:

RuntimeError: Could not run 'quantized::some_operator' with arguments from the 'CPU' backend...

RuntimeError: Could not run 'aten::thnn_conv2d_forward' with arguments from the 'QuantizedCPU' backend.

AttributeError: 'LinearPackedParams' object has no attribute '_modules'

torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow

We don’t have official GPU support yet, but we have two prototypes:
(1). PyTorch quantization + fx2trt lowering, inference in TensorRT (A100 and later GPUs): see examples in TensorRT/test_quant_trt.py at master · pytorch/TensorRT · GitHub

(2). Integration with cudnn through native quantized cuda ops: pytorch/test_quantized_op.py at master · pytorch/pytorch · GitHub
this project is an early prototype and has been paused

Supporting export to onnx model is not a priority for PyTorch quantization, please open an issue in GitHub - onnx/onnx: Open standard for machine learning interoperability when you encounter problems with ONNX, or reach out to people in this list: PyTorch Governance | Maintainers — PyTorch 1.12 documentation