Hi
I want to understand how quantization parameters are stored in PyTorch.
Consider the following toy example:
import torch
import torch.nn as nn
torch.random.manual_seed(0)
# Toy model. Two Linear layers & a ReLU
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.a = nn.Linear(1, 8)
self.act = nn.ReLU()
self.b = nn.Linear(8, 2)
self.quant = torch.ao.quantization.QuantStub()
self.dequant = torch.ao.quantization.DeQuantStub()
def forward(self, x):
x = self.quant(x)
x = self.a(x)
x = self.act(x)
x = self.b(x)
x = self.dequant(x)
return x
# Create original (unprepared model)
m_orig = Model()
print('Original model', m_orig)
# Create prepared model
m_orig.qconfig = torch.ao.quantization.get_default_qat_qconfig()
m = torch.ao.quantization.prepare(m_orig, inplace=False)
print('Prepared', m)
# Convert to quantized model
qm = torch.ao.quantization.convert(m, inplace=False)
In this case, the weight_fake_quant
keys are missing from the state dict.
...
(a): Linear(
in_features=1, out_features=8, bias=True
(activation_post_process): FusedMovingAvgObsFakeQuantize(
fake_quant_enabled=tensor([1]), observer_enabled=tensor([1]), scale=tensor([1.]), zero_point=tensor([0], dtype=torch.int32), dtype=torch.quint8, quant_min=0, quant_max=127, qscheme=torch.per_tensor_affine, reduce_range=True
(activation_post_process): MovingAverageMinMaxObserver(min_val=inf, max_val=-inf)
)
)
...
However, if I replace the prepare()
with prepare_qat()
, then these keys reappear
...
(a): Linear(
in_features=1, out_features=8, bias=True
(weight_fake_quant): FusedMovingAvgObsFakeQuantize(
fake_quant_enabled=tensor([1]), observer_enabled=tensor([1]), scale=tensor([1.]), zero_point=tensor([0], dtype=torch.int32), dtype=torch.qint8, quant_min=-128, quant_max=127, qscheme=torch.per_channel_symmetric, reduce_range=False
(activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
)
(activation_post_process): FusedMovingAvgObsFakeQuantize(
fake_quant_enabled=tensor([1]), observer_enabled=tensor([1]), scale=tensor([1.]), zero_point=tensor([0], dtype=torch.int32), dtype=torch.quint8, quant_min=0, quant_max=127, qscheme=torch.per_tensor_affine, reduce_range=True
(activation_post_process): MovingAverageMinMaxObserver(min_val=inf, max_val=-inf)
)
)
...
This is strange behavior to me. I would expect both prepare()
and prepare_qat()
to add weight quantization, but this isn’t the case. I guess I’m still trying to understand the differences between these two functions. Why wouldn’t prepare()
also quantize the weights? Which one should I be using if I want to obtain quantized weights?