Hello everyone,

I am trying to quantize the MobileNetV3, which I trained on the MNIST handwritten dataset.

In the documentation for quantization here on the pytorch website, I stumbled upon the prototybe function of “FX GRAPH MODE POST TRAINING STATIC QUANTIZATION”. Which I used to quantize my model. It worked, since when all the layers and weights are quantized now.

But before moving forward I wanted to compare the weights of the fully quantized model to if I only quantize a specific tensor with the same scale and zero-point.

x = torch.quantize_per_tensor(model.state_dict()[‘features.0.0.weight’][0], scale = 0.0057970373891294 , zero_point = 0, dtype = torch.qint8)

y = model_quantized.state_dict()[‘features.0.0.weight’][0]

print(f’{x}\n{y}\n’)

Output:

tensor

([[[-0.1333, -0.1971, 0.1623],

[-0.0174, -0.2377, 0.0928],

[ 0.1101, -0.2493, 0.1913]],

```
[[ 0.0754, -0.0406, -0.0232],
[ 0.0232, 0.1797, 0.3536],
[-0.0116, 0.0812, 0.1565]],
[[-0.0985, -0.1043, -0.1681],
[-0.0058, -0.1101, -0.0928],
[-0.0174, -0.1101, 0.1217]]], size=(3, 3, 3), dtype=torch.qint8,
quantization_scheme=torch.per_tensor_affine, scale=0.0057970373891294,
zero_point=0)
```

tensor

([[[-0.2261, -0.3304, 0.2667],

[-0.0290, -0.4058, 0.1565],

[ 0.1913, -0.4174, 0.3246]],

```
[[ 0.1275, -0.0638, -0.0348],
[ 0.0406, 0.3014, 0.5971],
[-0.0174, 0.1391, 0.2609]],
[[-0.1681, -0.1797, -0.2783],
[-0.0116, -0.1913, -0.1565],
[-0.0290, -0.1855, 0.2087]]], size=(3, 3, 3), dtype=torch.qint8,
quantization_scheme=torch.per_tensor_affine, scale=0.0057970373891294,
zero_point=0)
```

The original weights:

model.state_dict()[‘features.0.0.weight’][0]

tensor([[[-0.1360, -0.1972, 0.1600],

[-0.0184, -0.2405, 0.0924],

[ 0.1127, -0.2482, 0.1923]],

```
[[ 0.0749, -0.0395, -0.0211],
[ 0.0253, 0.1814, 0.3552],
[-0.0111, 0.0820, 0.1559]],
[[-0.1000, -0.1057, -0.1673],
[-0.0062, -0.1127, -0.0930],
[-0.0168, -0.1123, 0.1228]]])
```

As you can see the weights of the quantized model are quite different compared to the original ones, even in the qint8 dype.

Long story short, my qustion is what reason could be behind the weight change? From what I understood so far it looks like there is some kind of factor multiplied during the quantization process, could some kind of nomalization help?

Thank you