How to extract the quantized weight of quantized NN model

111179 · September 24, 2020, 4:29am

I am using Post Training Quantization and try to extract the quantized weight for inference phase, but I failed.
I try to directly use

for weight in quantized_model.state_dict():
np.set_printoptions(suppress=True)
print(weight, “\n”, quantized_model.state_dict()[weight].detach().cpu().clone().numpy())

get “TypeError: NumPy conversion for QuantizedCPUQInt8Type is not supported”

Could give me any advice for extracting the quantized weight from the quantized model?
Thank you very much!!

Vasiliy_Kuznetsov · September 24, 2020, 3:53pm

check out https://pytorch.org/docs/stable/quantization.html#quantized-torch-tensor-operations. Some options:

convert your quantized tensor to floating point with x.dequantize()
get the raw integer values with x.int_repr(), this should be used together with x.q_scale() and x.q_zero_point.

jerryzh168 · October 9, 2020, 1:01am

the error message shows that the weight is already a quantized weight, it’s just numpy operation is not supported in quantized weight, so if you remove the numpy() call it should work.

Hsilinh · November 24, 2021, 12:05am

If I have quantized weight, how can I extract in each layer. How can I separate tensor, scale and zero_point into array or numpy array?

(Pdb) quantized_model.state_dict()[‘model_fp32.conv1.weight’]
tensor([[[[ 6.6680e-03, 5.3344e-03, 2.0004e-02, …, -2.8006e-02,
-6.8014e-02, -1.4670e-02],
[ 1.3336e-02, 9.3353e-03, 2.1338e-02, …, -1.1602e-01,
-3.7341e-02, 5.0677e-02],
[ 2.0004e-02, -1.8671e-02, -5.3344e-02, …, -7.3348e-02,
9.8687e-02, 8.4017e-02],
…,
[-4.9344e-02, -2.9339e-02, 1.3336e-03, …, 1.3203e-01,
9.3353e-03, -6.0012e-02],
[-4.0008e-03, 2.1338e-02, -4.0008e-03, …, 5.3344e-03,
-6.6680e-02, 3.0673e-02],
[ 6.6680e-03, 6.6680e-03, 6.6680e-03, …, -3.0673e-02,
1.2002e-02, 8.4017e-02]],

     [[ 1.0669e-02,  2.0004e-02,  3.2007e-02,  ...,  4.0008e-03,
       -3.2007e-02,  5.3344e-03],
      [ 2.9339e-02,  3.6007e-02,  5.6012e-02,  ..., -8.9352e-02,
       -2.4005e-02,  2.9339e-02],
      [ 4.8010e-02,  2.9339e-02, -6.6680e-03,  ..., -5.6012e-02,
        7.4682e-02,  1.6003e-02],
      ...,
      [-1.8671e-02,  1.0669e-02,  2.8006e-02,  ...,  8.5351e-02,
       -8.0017e-02, -1.6803e-01],
      [ 1.8671e-02,  4.2675e-02, -1.3336e-03,  ..., -6.9348e-02,
       -1.5070e-01, -4.1342e-02],
      [ 2.1338e-02,  0.0000e+00, -2.8006e-02,  ..., -1.0936e-01,
       -4.0008e-02,  5.3344e-02]],

      ...,
      [ 3.7336e-02,  3.6070e-02,  3.9867e-02,  ...,  5.6953e-02,
        5.9484e-02,  3.8602e-02],
      [ 2.3414e-02,  5.0625e-02,  4.7461e-02,  ...,  6.9610e-02,
        5.8219e-02,  4.4930e-02],
      [ 6.9610e-03,  4.0500e-02,  5.0625e-02,  ...,  5.7586e-02,
        6.7078e-02,  5.1891e-02]]]], size=(64, 3, 7, 7), dtype=torch.qint8,
   quantization_scheme=torch.per_channel_affine,
   scale=tensor([1.3336e-03, 1.6211e-03, 1.5573e-03, 1.2130e-03, 4.1713e-04, 1.1682e-03,
    3.4338e-04, 1.4215e-03, 1.3556e-03, 1.1205e-03, 1.8750e-03, 8.0779e-07,
    7.6834e-04, 1.2095e-03, 4.1902e-04, 1.2486e-03, 9.4629e-04, 2.1297e-03,
    3.7952e-07, 4.7981e-04, 1.3093e-04, 1.4277e-03, 2.5236e-03, 1.3021e-03,
    2.5295e-03, 3.1860e-04, 3.6392e-04, 2.9317e-03, 1.9204e-03, 3.7749e-07,
    8.5399e-04, 9.0634e-04, 2.1589e-03, 1.1871e-03, 2.4718e-03, 3.7885e-07,
    2.1255e-06, 1.6729e-03, 8.8754e-07, 1.0482e-05, 1.2293e-03, 2.2856e-04,
    4.7812e-04, 3.0442e-03, 2.9528e-03, 1.5815e-03, 6.9944e-04, 1.3110e-03,
    2.4429e-03, 3.9633e-04, 1.4372e-03, 2.6385e-04, 2.5296e-04, 6.3496e-07,
    3.4408e-04, 3.4195e-04, 1.0956e-06, 6.7232e-04, 6.8186e-04, 2.4606e-03,
    3.9087e-04, 5.1462e-04, 8.7948e-07, 6.3281e-04], dtype=torch.float64),
   zero_point=tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
   axis=0)

Thanks,

Stephen

Hsilinh · November 24, 2021, 4:48pm

I type quantized_model.state_dict()[‘model_fp32.conv1.weight’].q_scale()

*** RuntimeError: Expected quantizer->qscheme() == kPerTensorAffine to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

The data_structure shows: Tensor quantization_scheme=torch.per_channel_affine
What’s wrong with that?

I can print out the int_repr() with dtype = torch.int8

Thanks,

Stephen

Hsilinh · November 24, 2021, 5:03pm

I tried to narrow down to a simple question.

x = torch.tensor([[-1.0, 0.0], [1.0, 2.0]])
y = torch.quantize_per_channel(x, torch.tensor([0.1, 0.01]), torch.tensor([10, 0]), 0, torch.quint8)
y
tensor([[-1., 0.],
[ 1., 2.]], size=(2, 2), dtype=torch.quint8,
quantization_scheme=torch.per_channel_affine,
scale=tensor([0.1000, 0.0100], dtype=torch.float64),
zero_point=tensor([10, 0]), axis=0)
y.int_repr()
tensor([[ 0, 10],
[100, 200]], dtype=torch.uint8)
y.q_scale()
*** RuntimeError: Expected quantizer->qscheme() == kPerTensorAffine to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

I installed pytorch version : 1.10.0

Thanks,
Stephen

Karthik_Ganesan · December 27, 2021, 9:32pm

Hi, not sure if you have already solved this but this is because torch supports two different quantization schemes: per tensor affine and per channel affine. In per tensor affine, a single scale and zero point are saved per tensor. So you can use .q_scale() as you did to get that single value.

However, in your case, PyTorch is using per channel affine which means there are N scale and zero point values, where N = number of channels. In this case you have to use `.q_per_channel_scales()’ to return a tensor of all the scale values.