Pre-quantized model with qint8 weights seems to be decimal

chenxj · December 6, 2022, 8:43am

>>> a["features.16.conv.1.0.weight"]
tensor([[[[ 1.6094,  2.3566,  1.6094],
          [ 1.0921,  0.2874,  1.0921],
          [ 0.5173, -1.2071,  0.1724]]],


        [[[ 0.4598,  0.2299,  0.4598],
          [ 0.3449,  2.2417,  0.5748],
          [ 0.4598,  0.7472,  0.4024]]],


        [[[ 1.3795, -0.4598, -0.6897],
          [ 2.1842, -0.8622, -1.1496],
          [ 1.0346, -0.8047, -0.7472]]],


        ...,


        [[[-0.8622, -1.6669, -0.9771],
          [ 0.5173, -0.6897,  0.7472],
          [ 0.8622,  2.1842,  0.9771]]],


        [[[ 0.4024, -0.3449,  0.4598],
          [-0.0575, -1.1496,  0.0000],
          [ 1.2071, -0.5748,  1.2071]]],


        [[[ 0.1724,  0.4598,  0.2299],
          [ 0.2299,  1.5519,  0.3449],
          [ 0.4598,  0.4598,  0.4598]]]], size=(960, 1, 3, 3),
       dtype=torch.qint8, quantization_scheme=torch.per_tensor_affine,
       scale=0.05747907608747482, zero_point=0)

but a[“features.16.conv.1.0.weight”][0, 0, 0, 1].item() is 2.
Anyone can explain this observation? Thanks.

Vasiliy_Kuznetsov · December 7, 2022, 5:37pm

Quantized tensors represent values in the floating point domain. When you print a quantized tensor, the printed values by PyTorch are the fp32 values that your quantized tensor represents.

When you print a[“features.16.conv.1.0.weight”][0, 0, 0, 1].item(), you are looking at the raw value which needs to then be transformed with fp = float(q - zp) * scale to convert to the floating point domain.