How to Quantize CNN into 4-bits?

Hi all…I am new to PyTorch as well as Quantization. I want to quantize a CNN model to custom bitwidhts. Could anyone provide me a link to source code so that I can get some idea…Thank you all…

we do not support 4 bit currently, but contributions are welcome, do you just want to try quantization aware training or do you want to run 4 bit kernels etc.?

Thank you for your time Jerry…I want to perform quantize aware training for a cnn model to lower bit precision than int8. I want to know the exact procedure. I found some articles on the internet. But they are mostly telling about how to calculate the scale, zero point and how to quantize and dequantize…I want to know how perform quantization aware training…

Hi @Kai123,

You can check this thread.

Currently, there is pytorch-quantization by NVIDIA.
You can change the number of bits.

1 Like

If you just need to do QAT then you can try setting quant_min, quant_max in FakeQuantize module I think

you can find the way we configure FakeQuantize here: pytorch/ at master · pytorch/pytorch · GitHub, we just need to configure FakeQuantize with quant_min, quant_max for 4 bit, e.g. -8, 7 and then define the qconfig based on that.

1 Like