Hi @shas19 , if you print out the quantized network it should show the scale and zero_points of various layers. Is there something else you are looking for? Could you be more specific?
I have the same question for printing quantized weights. Is there a way to see the quantized int values which are used in the convolution operators, like the example shown below:
class M(torch.nn.Module):
def __init__(self):
super(M, self).__init__()
# QuantStub converts tensors from floating point to quantized
self.quant = torch.quantization.QuantStub()
self.conv = torch.nn.Conv2d(1, 1, 1)
self.bn = torch.nn.BatchNorm2d(1)
self.relu = torch.nn.ReLU()
# DeQuantStub converts tensors from quantized to floating point
self.dequant = torch.quantization.DeQuantStub()
def forward(self, x):
x = self.quant(x)
x = self.conv(x)
print(x) # <- input is quantized
print(self.conv.weight) # <- It prints: bound method Conv2d.weight of QuantizedConv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
print(self.conv.bias) # <- It prints: <bound method Conv2d.bias of QuantizedConv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)>
x = self.bn(x)
x = self.relu(x)
x = self.dequant(x)
return x
# conver the model to qat
qconfig = QConfig(
activation = FakeQuantize.with_args(observer=MovingAverageMinMaxObserver),
weight = FakeQuantize.with_args(
observer=MovingAverageMinMaxObserver,
quant_min=-128, quant_max=127,
dtype=torch.qint8)
)
model_fp32 = M()
model_fp32.train()
model_fp32.qconfig = qconfig
model_fp32_prepared = prepare_qat(model_fp32)
model_fp32_prepared.eval()
model_int8 = convert(model_fp32_prepared)
# run the model
input_fp32 = torch.randn(4, 1, 4, 4)
res = model_int8(input_fp32)
Also, I did look into the code. I found the qat conv module calling fake quant before doing convolution.
And, the fake quant calls a fake quantize affine function, for example the fake_quantize_per_tensor_affine function in pytorch/torch/onnx/symbolic_opset10.py
I print the output of self.weight_fake_quant(self.weight) and see all quanted and requanted floating point numbers. My question is that the convolution seems using the floating point r instead of the integers q from the formula in the QAT’s paper r = S*(q - Z)?