How to Quantize CNN into 4-bits?

Kai123 · September 23, 2021, 8:56pm

Hi all…I am new to PyTorch as well as Quantization. I want to quantize a CNN model to custom bitwidhts. Could anyone provide me a link to source code so that I can get some idea…Thank you all…

jerryzh168 · October 1, 2021, 5:10am

we do not support 4 bit currently, but contributions are welcome, do you just want to try quantization aware training or do you want to run 4 bit kernels etc.?

Kai123 · October 1, 2021, 11:09pm

Thank you for your time Jerry…I want to perform quantize aware training for a cnn model to lower bit precision than int8. I want to know the exact procedure. I found some articles on the internet. But they are mostly telling about how to calculate the scale, zero point and how to quantize and dequantize…I want to know how perform quantization aware training…

seungjun · October 2, 2021, 7:18am

Hi @Kai123,

You can check this thread.

Currently, there is pytorch-quantization by NVIDIA.
You can change the number of bits.

jerryzh168 · October 4, 2021, 11:03pm

If you just need to do QAT then you can try setting quant_min, quant_max in FakeQuantize module I think

you can find the way we configure FakeQuantize here: https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/qconfig.py#L129, we just need to configure FakeQuantize with quant_min, quant_max for 4 bit, e.g. -8, 7 and then define the qconfig based on that.

christophezei · September 6, 2022, 2:10pm

Hello I did that, but now How can I simulate model size as well, since sometimes I simulate weights and activations with different bit width for exp: weights using 4 bits and activations: 8bits

jerryzh168 · September 12, 2022, 6:08pm

I think you’d probably need to estimate the model size since to support this in tensor you’d have to modify pytorch core right now, which is not easy to do. We might move quantization out of core and make extensions easier in the future