Quantization: Prepared model & Converted model give different outputs

Hi, I am new to quantization with PyTorch

From various tutorials, I undertsand that the procedure to do static quantization goes like this:
1- Add QuantStubs/DeQuantStubs before the part to quantize
2- Add a qconfig to the modules that should be quantized
3- Call prepare() to setup the observers (and fake quantization?)
4- Calibrate with data/QAT
5- Convert to quantized model using convert()

Based on what I understand, I would assume that after calling the prepare() method, the QuantStub will track statistics, and also model quantization error in the forward pass for QAT. However, this isn’t what I see:

import torch
import torch.nn as nn

torch.random.manual_seed(0)

# Toy model. Two Linear layers & a ReLU
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.a = nn.Linear(1, 8)
        self.act = nn.ReLU()
        self.b = nn.Linear(8, 2)
        self.quant = torch.ao.quantization.QuantStub()
        self.dequant = torch.ao.quantization.DeQuantStub()

    def forward(self, x):
        x = self.quant(x)

        x = self.a(x)
        x = self.act(x)
        x = self.b(x)

        x = self.dequant(x)
        return x

# Create original (unprepared model)
m_orig = Model()
print('Original model', m_orig)

# Create prepared model (Fake quantize should be added here???)
m_orig.qconfig = torch.ao.quantization.get_default_qconfig()
m = torch.ao.quantization.prepare(m_orig, inplace=False)
print('Prepared', m)


# Random input
def runRandom(model):
    a = torch.randn(1, 5, 1)
    b = model(a)
    return b

# Fixed input
x = torch.randn(1, 5, 1)
def runFixed(model):
    b = model(x)
    return b

# Calibrate
print('Random input', runRandom(m))

# Convert to quantized model
qm = torch.ao.quantization.convert(m, inplace=False)

# Test
print('[Prepared model] Fixed input', runFixed(m))
print('[Original model] Fixed input', runFixed(m_orig))

print("Quantized model", qm)
print("[Quantized model] Fixed input", runFixed(qm))

However, in the code above, I see that the original (unprepared) model and the prepared model produce the exact same output, while the quantized model produces a different output than either of the two:

[Prepared model] Fixed input tensor([[[-0.0310, -0.0343],
         [ 0.2967,  0.2313],
         [-0.1149, -0.1024],
         [-0.3348, -0.1802],
         [-0.1974, -0.1572]]], grad_fn=<ViewBackward0>)
[Original model] Fixed input tensor([[[-0.0310, -0.0343],
         [ 0.2967,  0.2313],
         [-0.1149, -0.1024],
         [-0.3348, -0.1802],
         [-0.1974, -0.1572]]], grad_fn=<ViewBackward0>)
[Quantized model] Fixed input tensor([[[-0.0282, -0.0323],
         [ 0.2381,  0.1856],
         [-0.1170, -0.1049],
         [-0.3349, -0.1775],
         [-0.1977, -0.1574]]])

I would think that the prepared model would simulate quantization effects to some extent. Is there something I’m missing here?

m_orig.qconfig = torch.ao.quantization.get_default_qconfig()

should have been

m_orig.qconfig = torch.ao.quantization.get_default_qat_qconfig()