Quantization Docs
Main Doc: Quantization — PyTorch master documentation
API Reference: Quantization API Reference — PyTorch master documentation
Common Errors
Please check common errors in: Quantization — PyTorch master documentation
Examples:
RuntimeError: Could not run 'quantized::some_operator' with arguments from the 'CPU' backend...
RuntimeError: Could not run 'aten::thnn_conv2d_forward' with arguments from the 'QuantizedCPU' backend.
AttributeError: 'LinearPackedParams' object has no attribute '_modules'
torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow
Quantized Inference in GPU
We don’t have official GPU support yet, but we have two prototypes:
(1). PyTorch quantization + fx2trt lowering, inference in TensorRT (A100 and later GPUs): see examples in TensorRT/test_quant_trt.py at master · pytorch/TensorRT · GitHub
(2). Integration with cudnn through native quantized cuda ops: pytorch/test_quantized_op.py at master · pytorch/pytorch · GitHub
this project is an early prototype and has been paused
ONNX Support for my quantized model
Supporting export to onnx model is not a priority for PyTorch quantization, please open an issue in GitHub - onnx/onnx: Open standard for machine learning interoperability when you encounter problems with ONNX, or reach out to people in this list: PyTorch Governance | Maintainers — PyTorch 1.12 documentation
LSTM quantization support
LSTM is supported through our custom module api in both eager mode and fx graph mode quantization.
Eager Mode: pytorch/test_quantized_op.py at master · pytorch/pytorch · GitHub
FX Graph Mode: pytorch/test_quantize_fx.py at master · pytorch/pytorch · GitHub