Purpose of scale and zero point for layer

dassima · May 14, 2020, 3:47pm

Hi,

I have this

{'model_state_dict': OrderedDict([(u'conv_3x3_32a.weight', tensor([[
          [[ 0.0367,  0.0294, -0.1065],
          [ 0.0918,  0.1065, -0.0331],
          [-0.0147,  0.0184, -0.1028]]],
.......
          [[[ 0.1249,  0.0661, -0.0257],
          [ 0.0735, -0.0257, -0.1028],
          [ 0.0441, -0.0698, -0.0771]]]], size=(40, 1, 3, 3),
       dtype=torch.qint8, quantization_scheme=torch.per_tensor_affine,
       scale=0.00367316859774, zero_point=0)), (u'conv_3x3_32a.scale', tensor(0.0031)), (u'conv_3x3_32a.zero_point', tensor(160))

I understand that the weights tensor has its own scale which is 0.00367316859774, but I have 2 questions:

Which is the purpose of the layer scale and zero point? Where do I use them?
How can I find which is the re-quantization scale used after the weight and input multiplication and accumulation? I don’t know how to access it.

supriyar · May 20, 2020, 4:49am

scale and zero point are the quantization parameters for the layer. They are used to quantize the weight from fp32 to int8 domain.

re-quantization scale is defined based on input, weight and output scale. It is defined as
requant_scale = input_scale_fp32 * weight_scale_fp32 / output_scale_fp32

The conversion from accumulated int32 values to fp32 happens in the quantization backends, either FBGEMM or QNNPACK and the requantization scale can be found there.
cc @dskhudia

dassima · May 20, 2020, 9:54am

Hi,

thank you so much for your answer and please excuse me, but there are 2 scales that seems connected to the weights. The one inside the tensor parameters and the one of the layer. In my example:

dassima:

(u'conv_3x3_32a.weight', tensor([[
          [[ 0.0367,  0.0294, -0.1065],
          [ 0.0918,  0.1065, -0.0331],
          [-0.0147,  0.0184, -0.1028]]],
.......
          [[[ 0.1249,  0.0661, -0.0257],
          [ 0.0735, -0.0257, -0.1028],
          [ 0.0441, -0.0698, -0.0771]]]], size=(40, 1, 3, 3),
       dtype=torch.qint8, quantization_scheme=torch.per_tensor_affine,
       scale=0.00367316859774, zero_point=0)

and

so number 0.00367316859774 and 0.0031 which generate different results. If the 0.0031 does the actual conversion from 32FP to 8INT, what does 0.00367316859774 ?

supriyar · May 20, 2020, 4:29pm

(u’conv_3x3_32a.scale’, tensor(0.0031))

This scale value here refers to the output scale of the layer. Please see the docs for more details
https://pytorch.org/docs/stable/quantization.html#id12

AndreiXYZ · June 3, 2021, 3:12pm

@dassima
I’m probably late to the thread, but what quantization scheme did you use to get zero_point=0?