Quantization official example

Hi,

Is there any official quantization training examples?

We have examples using the previous eager mode quantization: Quantization Recipe — PyTorch Tutorials 2.0.1+cu117 documentation and https://github.com/pytorch/vision/blob/main/references/classification/train_quantization.py

but @andrewor is working on a tutorial for the new quantization flow ((prototype) PyTorch 2.0 Export Post Training Static Quantization — PyTorch Tutorials 2.0.1+cu117 documentation) as well

1 Like

Hi @jerryzh168 , @andrewor

I am interested in the comments in the link:
https://pytorch.org/docs/stable/quantization.html#
In addition, PyTorch also supports quantization aware training, which models quantization errors in both the forward and backward passes using fake-quantization modules.

Do you have example code/doc for my reference?

yeah the doc I linked is the example for eager mode quantization flow.

@jerryzh168

Dear Jerry,
what I am looking for is the quantization aware training. is your link the same as what I asked?

Yes, that’s correct. did you see that?

@jerryzh168

I read the link: Quantization — PyTorch 2.1 documentation again and again, and finally found the Quanttizaton Aware Training which is what I need is limited support for now.

when will Quantization Aware Training feature be ready?

Hi, @jerryzh168

according to the docs here: Quantization — PyTorch 2.1 documentation
Very rare operators are supported:

My questions are:

  1. nn.Conv1d/2d/3d are not supported. This is not understandable. if they are not supported, how could you test CNN accuracy in:Introduction to Quantization on PyTorch | PyTorch?

  2. BatchNorm2d is not supported, is my understanding correct?

  3. the purpose we quantize CNN is to run inference in int8 data type. if some operators are not support, it means that there is operator gap. when we run inference, some operators are in float32, but some others are in ‘int8’. it means that we have to quantize some layers to int8, but we have to dequantize some otheres to float32. the ‘quantize’ and dequantize are very inefficient.

this is talking about quantization aware training support for dynamic quantization, it only supports linear probably.

for QAT support for static quantization (the next row), we have more broader support, especially for CNNs.

  1. it’s talking about dynamic quantization I think
  2. please look at the column names, this table is talking about support static and dynamic quantization
  3. yeah agree

Dear @jerryzh168

I have checked quantization doc in pytorch tutorial, and I have also checked TensorRT doc. I have also tried both for my network which includes only Conv2d, BN and relu.
The accuracy is very good if I quantize Conv2d and relu, but don’t quantize BN layer, but it doesn’t converge if I quantize Covn2d, BN, and relu layer.

I don’t know whether there is something wrong in pytorch quantization code or something wrong with my code.
Do you have any suggestion/idea about my quantization?

can you share your code to quantize the model? if you quantize conv - bn - relu please make sure you have fused them to a special qat_conv_bn_relu module