Manually performing quantization yields different result than expected

I have a model that’s quite small - 2 inputs, a hidden layer with 3 nodes and a 6 class output. Eventually I’d like to load this model onto hardware and use fixed point representation for some of the values. What I’m confused about is how the quantization happens and the different scales/zero points to use when.

For instance, this is my state_dict:

OrderedDict([('input_layer_input_scale_0', tensor(0.0039)),
             ('input_layer_input_zero_point_0', tensor(0)),
             ('input_layer.scale', tensor(0.0297)),
             ('input_layer.zero_point', tensor(0)),
             ('input_layer._packed_params.dtype', torch.qint8),
             ('input_layer._packed_params._packed_params',
              (tensor([[-0.1180,  0.1180],
                       [-0.2949, -0.5308],
                       [-3.3029, -7.5496]], size=(3, 2), dtype=torch.qint8,
                      quantization_scheme=torch.per_tensor_affine, scale=0.05898105353116989,
                      zero_point=0),
               Parameter containing:
               tensor([-0.4747, -0.3563,  7.7603], requires_grad=True))),
             ('out.scale', tensor(1.5963)),
             ('out.zero_point', tensor(243)),
             ('out._packed_params.dtype', torch.qint8),
             ('out._packed_params._packed_params',
              (tensor([[  0.4365,   0.4365, -55.4356],
                       [  0.4365,   0.0000,   1.3095],
                       [  0.4365,   0.0000, -13.9680],
                       [  0.4365,  -0.4365,   4.3650],
                       [  0.4365,   0.4365,  -3.0555],
                       [  0.4365,   0.0000,  -1.3095],
                       [  0.4365,   0.0000,   3.0555]], size=(7, 3), dtype=torch.qint8,
                      quantization_scheme=torch.per_tensor_affine, scale=0.43650051951408386,
                      zero_point=0),
               Parameter containing:
               tensor([ 19.2761,  -1.0785,  14.2602, -22.3171,  10.1059,   7.2197, -11.7253],
                      requires_grad=True)))])

and if I give it a set of inputs like this:

inputs = np.array(
     [[1.  , 1.  ], # class 0 example
      [1.  , 0.  ], # class 1 example
      [0.  , 1.  ], # 2
      [0.  , 0.  ], # 3 
      [0.  , 0.9 ], # 4 
      [0.  , 0.75], # 5
      [0.  , 0.25]]) # class 6 example

I can verify decent accuracy with this:

>>> mq(torch.from_numpy(inputs).float()).argmax(-1)
tensor([0, 1, 2, 3, 4, 5, 1])

It gets the last one wrong but the others right. Doesn’t matter in this case because I’m just trying to reproduce this result. My question then becomes, how do I use the scales and zero points to get this same result?

I thought it would be something like
W1 = input_layer_weights / 0.05898105353116989 + 0 # weight scale and weight zp from what I understand
b1 = input_layer_bias / 0.05898105353116989 + 0 # same as above
Z1 = saturate(round(inputs @ W1.T + b1.T))
But even at this point it’s different than if I were to do qmodel.input_layer(quantized_input). So this makes me believe I’m doing the math incorrectly.

What am I missing with the scales and bias? Why is there input_layer_input_scale_0, input_layer.scale and a scale associated with the weight matrix, all with different values?

Hi Chris,

I believe that the inputs to your quantized model need to be quantized as well, which is what input_layer_input_scale_0 and input_layer_zero_point_0 refer to.

I think additionally you need to quantize the outputs of the linear as well, which is what input_layer.scale and input_layer.zero_point refer to.

Do you get expected results if you calculate?


# calculate quantized weights
W1 = input_layer_weights / 0.05898105353116989 + 0 # weight scale and weight zp from what I understand
b1 = input_layer_bias / 0.05898105353116989 + 0 # same as above

quantized_inputs = inputs / 0.0039 + 0 # quantize the inputs as well. 

Z1 = saturate(round(quantized_inputs @ W1.T + b1.T)) / 0.0297 + 0 # calculate quantized output

No, this still results in an incorrect result. I think your last step would need to be multiplied by 0.0297 because it eventually will have a float value again, otherwise it will continue to grow larger and larger. But even if I do that I get an incorrect result.