[quantization] Frequently Asked Questions

Quantization Docs

Main Doc: Quantization — PyTorch master documentation

API Reference: Quantization API Reference — PyTorch master documentation

Common Errors

Please check common errors in: Quantization — PyTorch master documentation

Examples:

RuntimeError: Could not run 'quantized::some_operator' with arguments from the 'CPU' backend...
RuntimeError: Could not run 'aten::thnn_conv2d_forward' with arguments from the 'QuantizedCPU' backend.
AttributeError: 'LinearPackedParams' object has no attribute '_modules'
torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow

Quantized Inference in GPU

We don’t have official GPU support yet, but we have two prototypes:
(1). PyTorch quantization + fx2trt lowering, inference in TensorRT (A100 and later GPUs): see examples in TensorRT/test_quant_trt.py at master · pytorch/TensorRT · GitHub

(2). Integration with cudnn through native quantized cuda ops: pytorch/test_quantized_op.py at master · pytorch/pytorch · GitHub
this project is an early prototype and has been paused

ONNX Support for my quantized model

Supporting export to onnx model is not a priority for PyTorch quantization, please open an issue in GitHub - onnx/onnx: Open standard for machine learning interoperability when you encounter problems with ONNX, or reach out to people in this list: PyTorch Governance | Maintainers — PyTorch 1.12 documentation

LSTM quantization support

LSTM is supported through our custom module api in both eager mode and fx graph mode quantization.
Eager Mode: pytorch/test_quantized_op.py at master · pytorch/pytorch · GitHub
FX Graph Mode: pytorch/test_quantize_fx.py at master · pytorch/pytorch · GitHub