Quantization scales and bias

I was using Pytorch for post-training quantization for my resnet18 model. Following is part of the code.

net.qconfig = torch.quantization.QConfig(
    activation=torch.quantization.MinMaxObserver.with_args(dtype=torch.quint8, qscheme=torch.per_tensor_symmetric), 
    weight=torch.quantization.MinMaxObserver.with_args(dtype=torch.qint8, qscheme=torch.per_tensor_symmetric))

I wanted to print bias and scale for each tensor that is being used internally for each Tensor.
Can someone please help me do it the right way?

Thanks.

Hi @shas19 , if you print out the quantized network it should show the scale and zero_points of various layers. Is there something else you are looking for? Could you be more specific?

Hi,

I have the same question for printing quantized weights. Is there a way to see the quantized int values which are used in the convolution operators, like the example shown below:

class M(torch.nn.Module):
    def __init__(self):
        super(M, self).__init__()
        # QuantStub converts tensors from floating point to quantized
        self.quant = torch.quantization.QuantStub()
        self.conv = torch.nn.Conv2d(1, 1, 1)
        self.bn = torch.nn.BatchNorm2d(1)
        self.relu = torch.nn.ReLU()
        # DeQuantStub converts tensors from quantized to floating point
        self.dequant = torch.quantization.DeQuantStub()

    def forward(self, x):
        x = self.quant(x)
        x = self.conv(x)
        print(x)  # <- input is quantized
        print(self.conv.weight)   # <- It prints: bound method Conv2d.weight of QuantizedConv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
        print(self.conv.bias)   # <- It prints: <bound method Conv2d.bias of QuantizedConv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)>
        x = self.bn(x)
        x = self.relu(x)
        x = self.dequant(x)
        return x

# conver the model to qat
qconfig = QConfig(
    activation = FakeQuantize.with_args(observer=MovingAverageMinMaxObserver),
    weight = FakeQuantize.with_args(
        observer=MovingAverageMinMaxObserver,
        quant_min=-128, quant_max=127,
        dtype=torch.qint8)
)
model_fp32 = M()
model_fp32.train()
model_fp32.qconfig = qconfig
model_fp32_prepared = prepare_qat(model_fp32)
model_fp32_prepared.eval()
model_int8 = convert(model_fp32_prepared)
# run the model
input_fp32 = torch.randn(4, 1, 4, 4)
res = model_int8(input_fp32)

Also, I did look into the code. I found the qat conv module calling fake quant before doing convolution.

And, the fake quant calls a fake quantize affine function, for example the fake_quantize_per_tensor_affine function in pytorch/torch/onnx/symbolic_opset10.py

I print the output of self.weight_fake_quant(self.weight) and see all quanted and requanted floating point numbers. My question is that the convolution seems using the floating point r instead of the integers q from the formula in the QAT’s paper r = S*(q - Z)?