Hey everyone! I am looking for a way to perform Quantization-Aware Training (QAT) using PyTorch.
My usecase concerns deploying trained PyTorch models on custom hardware (silicon) and so I have a few requirements:
- Needs to support
nn.Conv1d
(as this is part of the network that I want to deploy) - Needs to support some form of batch-norm folding
- Needs to have power-of-two scales (as this avoids integer divisions in hardware)
- Preferably does not require me to redefine my model with quantized modules
- Supports up-to-date Python and PyTorch versions
In my search, I checked out all possible quantization-aware-training frameworks I could find and made a list of them (see below). In brackets () indicates the last commit date and in square brackets [] it shows whether I tried it or not.
- QONNX: GitHub - fastmachinelearning/qonnx (3 weeks ago)
- [-] ONNX-only
- QKeras: GitHub - google/qkeras: QKeras: a quantization deep learning library for Tensorflow Keras (yesterday)
- [+] Power of two scaling
- [-] Need to rewrite network
- [-] Keras-only
- TorchQuant: GitHub - camlsys/torchquant: A Hackable Quantization Library for PyTorch (29 March 2021)
- [-] Doesnt support Conv1d
- MQBench: GitHub - ModelTC/MQBench: Model Quantization Benchmark (14 Feb 2023) [tried]
- [-] Doesnt work with PyTorch 1.13.1
- NNCF: GitHub - openvinotoolkit/nncf: Neural Network Compression Framework for enhanced OpenVINO™ inference (2 days ago) [tried]
- [-] Doesnt support power of-two scaling
- [+] Amazing Github support
- [-] Hard to export quantized model parameters
- QPyTorch: GitHub - Tiiiger/QPyTorch: Low Precision Arithmetic Simulation in PyTorch (14 Jan 2022)
- [-] Not maintained
- AI Model Efficiency Toolkit (AIMET): GitHub - quic/aimet: AIMET is a library that provides advanced quantization and compression techniques for trained neural network models. (19 hours ago) [tried]
- [-] Only supports Python 3.8
- [-] Really bad support via Github
- [+] Very large feature set
- [+] Tutorial videos
- [+/-] Supports PyTorch 1.13 since latest commit, but no pre-built package available yet
- [-] Building is on Python 3.10 is not trivial!
- Brevitas: GitHub - Xilinx/brevitas: Brevitas: quantization-aware training in PyTorch (10 Jan 2023)
- [-] Doesnt support batch norm
- [-] Requires you to rewrite your entire network
- pytorch-quantization: pytorch-quantization’s documentation — pytorch-quantization master documentation (very recent)
- [-] Only int8
- NEMO: GitHub - pulp-platform/nemo: NEural Minimizer for pytOrch (23 Feb 2022)
- [-] Not maintained anymore
- model_optimization: GitHub - sony/model_optimization: Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks. (1 hour ago) [paper: https://arxiv.org/pdf/2109.09113.pdf, tried]
- [+] Power-of-two quantization
- [+] Simple to use
- [+] Very active repository and members
- [-] Not possible to get access to quantized weights?
- PyTorch quantization:
- [-] Need to implement custom observer + fake quantizer
However, none of these options really work or have all the features that I need. Does anyone else have suggestions on what I can use/do?