Question about quint8 and qint8

I have obtained quantization parameters through PyTorch quantization and now I want to perform inference based on these parameters. However, I have encountered an issue where the quantized result of a layer is greater than 128, for example, 200, and PyTorch represents this value using quint8. Additionally, some computed values result are 0, such as after the ReLU activation of negative numbers.

During the computation of the next layer, I should subtract the zero point from the input. However, since the zero point is not zero, let’s say it is 10, subtracting 10 from 0 results in -10, which cannot be represented using uint8 and requires int8 representation. On the other hand, subtracting 10 from 200 results in 190, which can be represented using uint8. This creates a contradiction, and I’m unsure how to perform the calculations correctly.

For example, in this particular layer, the input zero point (input_zp) is 58, but the computed minimum value is 0.

In the next layer, on the other hand, the input zero point (input_zp) is 0, and the computed maximum value is 144.


looks like you might have some confusion around how quantization is done, can you take a look at Quantization - Neural Network Distiller to see if it helps