Hi, I am new to quantization with PyTorch
From various tutorials, I undertsand that the procedure to do static quantization goes like this:
1- Add QuantStubs/DeQuantStubs before the part to quantize
2- Add a qconfig to the modules that should be quantized
3- Call prepare() to setup the observers (and fake quantization?)
4- Calibrate with data/QAT
5- Convert to quantized model using convert()
Based on what I understand, I would assume that after calling the prepare() method, the QuantStub will track statistics, and also model quantization error in the forward pass for QAT. However, this isn’t what I see:
import torch
import torch.nn as nn
torch.random.manual_seed(0)
# Toy model. Two Linear layers & a ReLU
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.a = nn.Linear(1, 8)
self.act = nn.ReLU()
self.b = nn.Linear(8, 2)
self.quant = torch.ao.quantization.QuantStub()
self.dequant = torch.ao.quantization.DeQuantStub()
def forward(self, x):
x = self.quant(x)
x = self.a(x)
x = self.act(x)
x = self.b(x)
x = self.dequant(x)
return x
# Create original (unprepared model)
m_orig = Model()
print('Original model', m_orig)
# Create prepared model (Fake quantize should be added here???)
m_orig.qconfig = torch.ao.quantization.get_default_qconfig()
m = torch.ao.quantization.prepare(m_orig, inplace=False)
print('Prepared', m)
# Random input
def runRandom(model):
a = torch.randn(1, 5, 1)
b = model(a)
return b
# Fixed input
x = torch.randn(1, 5, 1)
def runFixed(model):
b = model(x)
return b
# Calibrate
print('Random input', runRandom(m))
# Convert to quantized model
qm = torch.ao.quantization.convert(m, inplace=False)
# Test
print('[Prepared model] Fixed input', runFixed(m))
print('[Original model] Fixed input', runFixed(m_orig))
print("Quantized model", qm)
print("[Quantized model] Fixed input", runFixed(qm))
However, in the code above, I see that the original (unprepared) model and the prepared model produce the exact same output, while the quantized model produces a different output than either of the two:
[Prepared model] Fixed input tensor([[[-0.0310, -0.0343],
[ 0.2967, 0.2313],
[-0.1149, -0.1024],
[-0.3348, -0.1802],
[-0.1974, -0.1572]]], grad_fn=<ViewBackward0>)
[Original model] Fixed input tensor([[[-0.0310, -0.0343],
[ 0.2967, 0.2313],
[-0.1149, -0.1024],
[-0.3348, -0.1802],
[-0.1974, -0.1572]]], grad_fn=<ViewBackward0>)
[Quantized model] Fixed input tensor([[[-0.0282, -0.0323],
[ 0.2381, 0.1856],
[-0.1170, -0.1049],
[-0.3349, -0.1775],
[-0.1977, -0.1574]]])
I would think that the prepared model would simulate quantization effects to some extent. Is there something I’m missing here?