Quantization official example

Ardeal · September 13, 2023, 9:52am

Hi,

Is there any official quantization training examples?

jerryzh168 · September 14, 2023, 12:56am

We have examples using the previous eager mode quantization: Quantization Recipe — PyTorch Tutorials 2.0.1+cu117 documentation and https://github.com/pytorch/vision/blob/main/references/classification/train_quantization.py

but @andrewor is working on a tutorial for the new quantization flow ((prototype) PyTorch 2.0 Export Post Training Static Quantization — PyTorch Tutorials 2.0.1+cu117 documentation) as well

Ardeal · September 14, 2023, 1:19am

Hi @jerryzh168 , @andrewor

I am interested in the comments in the link:
https://pytorch.org/docs/stable/quantization.html#
In addition, PyTorch also supports quantization aware training, which models quantization errors in both the forward and backward passes using fake-quantization modules.

Do you have example code/doc for my reference?

jerryzh168 · September 15, 2023, 12:06am

yeah the doc I linked is the example for eager mode quantization flow.

Ardeal · September 15, 2023, 5:39am

@jerryzh168

Dear Jerry,
what I am looking for is the quantization aware training. is your link the same as what I asked?

jerryzh168 · September 15, 2023, 5:39pm

Yes, that’s correct. did you see that?

Ardeal · October 7, 2023, 2:54am

@jerryzh168

I read the link: Quantization — PyTorch 2.1 documentation again and again, and finally found the Quanttizaton Aware Training which is what I need is limited support for now.

when will Quantization Aware Training feature be ready?

Ardeal · October 7, 2023, 4:48am

Hi, @jerryzh168

according to the docs here: Quantization — PyTorch 2.1 documentation
Very rare operators are supported:

My questions are:

nn.Conv1d/2d/3d are not supported. This is not understandable. if they are not supported, how could you test CNN accuracy in:Introduction to Quantization on PyTorch | PyTorch?
BatchNorm2d is not supported, is my understanding correct?
the purpose we quantize CNN is to run inference in int8 data type. if some operators are not support, it means that there is operator gap. when we run inference, some operators are in float32, but some others are in ‘int8’. it means that we have to quantize some layers to int8, but we have to dequantize some otheres to float32. the ‘quantize’ and dequantize are very inefficient.

jerryzh168 · October 12, 2023, 7:18pm

this is talking about quantization aware training support for dynamic quantization, it only supports linear probably.

for QAT support for static quantization (the next row), we have more broader support, especially for CNNs.

jerryzh168 · October 12, 2023, 7:19pm

it’s talking about dynamic quantization I think
please look at the column names, this table is talking about support static and dynamic quantization
yeah agree

Ardeal · October 13, 2023, 1:52am

Dear @jerryzh168

I have checked quantization doc in pytorch tutorial, and I have also checked TensorRT doc. I have also tried both for my network which includes only Conv2d, BN and relu.
The accuracy is very good if I quantize Conv2d and relu, but don’t quantize BN layer, but it doesn’t converge if I quantize Covn2d, BN, and relu layer.

I don’t know whether there is something wrong in pytorch quantization code or something wrong with my code.
Do you have any suggestion/idea about my quantization?

jerryzh168 · October 13, 2023, 2:14am

can you share your code to quantize the model? if you quantize conv - bn - relu please make sure you have fused them to a special qat_conv_bn_relu module