Purpose of scale and zero point for layer


I have this

{'model_state_dict': OrderedDict([(u'conv_3x3_32a.weight', tensor([[
          [[ 0.0367,  0.0294, -0.1065],
          [ 0.0918,  0.1065, -0.0331],
          [-0.0147,  0.0184, -0.1028]]],
          [[[ 0.1249,  0.0661, -0.0257],
          [ 0.0735, -0.0257, -0.1028],
          [ 0.0441, -0.0698, -0.0771]]]], size=(40, 1, 3, 3),
       dtype=torch.qint8, quantization_scheme=torch.per_tensor_affine,
       scale=0.00367316859774, zero_point=0)), (u'conv_3x3_32a.scale', tensor(0.0031)), (u'conv_3x3_32a.zero_point', tensor(160))

I understand that the weights tensor has its own scale which is 0.00367316859774, but I have 2 questions:

  1. Which is the purpose of the layer scale and zero point? Where do I use them?
  2. How can I find which is the re-quantization scale used after the weight and input multiplication and accumulation? I don’t know how to access it.

scale and zero point are the quantization parameters for the layer. They are used to quantize the weight from fp32 to int8 domain.

re-quantization scale is defined based on input, weight and output scale. It is defined as
requant_scale = input_scale_fp32 * weight_scale_fp32 / output_scale_fp32

The conversion from accumulated int32 values to fp32 happens in the quantization backends, either FBGEMM or QNNPACK and the requantization scale can be found there.
cc @dskhudia

thank you so much for your answer and please excuse me, but there are 2 scales that seems connected to the weights. The one inside the tensor parameters and the one of the layer. In my example:


so number 0.00367316859774 and 0.0031 which generate different results. If the 0.0031 does the actual conversion from 32FP to 8INT, what does 0.00367316859774 ?

(u’conv_3x3_32a.scale’, tensor(0.0031))

This scale value here refers to the output scale of the layer. Please see the docs for more details

I’m probably late to the thread, but what quantization scheme did you use to get zero_point=0?