Bias in MobileNet V2 int8 model

I tried to understand the computation flow of pytorch MobileNet V2 int8 model and want to know how the bias, scale and zero-point are applied to a fused convolution layer. For instance as following, this layer has 4 params in state_dict: weight, bias, scale, zero-point. The weight is quantized from FP32 to INT8 with its own scale 0.106 and zero point; and the scale 0.0693 is supposed to convert accumulated result from FP32 to INT8 for next layer. But how to apply the bias? Does the bias applied to accumulated results after multiplication? These bias looks pretty small number comparing to accumulated results.

(‘features.1.conv.0.0.weight’,
tensor([[[[ -0.1069, -0.1069, -0.1069],
[ -0.1069, 0.0000, 0.8550],
[ -0.1069, -0.1069, 0.1069]]],

[[[ 0.9619, -0.4275, -0.7482],
[ 4.3820, -0.3206, -3.9545],
[ 0.9619, -0.2138, -0.5344]]]], size=(32, 1, 3, 3),
dtype=torch.qint8, quantization_scheme=torch.per_tensor_affine,
scale=0.10687889158725739, zero_point=0)),
(‘features.1.conv.0.0.bias’,
tensor([-1.1895e-02, 8.7035e-01, -6.8617e-02, 3.8501e-01, 3.2915e-01,

8.4619e-01, -1.9708e-01], requires_grad=True)),
(‘features.1.conv.0.0.scale’, tensor(0.0693)),
(‘features.1.conv.0.0.zero_point’, tensor(0)),
(‘features.1.conv.1.weight’,

For Conv and Linear operations the bias stored in the state_dict is in FP32.
For FBGEMM the bias is not quantized and we add the bias to the result of the final quantized matrix multiplication.
For QNNPACK the bias is quantized to int32 (internally in the operator) and then added to the intermediate quantized output.

@supriyar hello,I want to set bias to int8, what i need to do ? thanks