scale and zero point are the quantization parameters for the layer. They are used to quantize the weight from fp32 to int8 domain.
re-quantization scale is defined based on input, weight and output scale. It is defined as
requant_scale = input_scale_fp32 * weight_scale_fp32 / output_scale_fp32
The conversion from accumulated int32 values to fp32 happens in the quantization backends, either FBGEMM or QNNPACK and the requantization scale can be found there.
cc @dskhudia
thank you so much for your answer and please excuse me, but there are 2 scales that seems connected to the weights. The one inside the tensor parameters and the one of the layer. In my example:
and
so number 0.00367316859774 and 0.0031 which generate different results. If the 0.0031 does the actual conversion from 32FP to 8INT, what does 0.00367316859774 ?