Hello,
I’m kind of confused by the use of Quant/DeQuantStubs
for Quantization Aware Training.
From my understanding only layers in between the Quant/DequantStubs are supposed to be quantised (is that correct?) but for my model when I place quantstubs around just the backbone:
x = self.quant0(x)
x = self.backbone0(x)
x = self.dequant(x)
confidence = self.classification_headers0(x)
and I look at the layers in classification_headers before and after preparation:
print(model.classification_headers0)
model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')
torch.quantization.prepare_qat(model, inplace=True)
print(model.classification_headers0)
I get
Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64)
(1): ReLU()
(2): Conv2d(64, 6, kernel_size=(1, 1), stride=(1, 1))
)
Sequential(
(0): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
(activation_post_process): FakeQuantize(
fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8), scale=tensor([1.]), zero_point=tensor([0])
(activation_post_process): MovingAverageMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
)
(weight_fake_quant): FakeQuantize(
fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8), scale=tensor([1.]), zero_point=tensor([0])
(activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
)
)
(1): ReLU(
(activation_post_process): FakeQuantize(
fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8), scale=tensor([1.]), zero_point=tensor([0])
(activation_post_process): MovingAverageMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
)
)
(2): Conv2d(
64, 6, kernel_size=(1, 1), stride=(1, 1)
(activation_post_process): FakeQuantize(
fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8), scale=tensor([1.]), zero_point=tensor([0])
(activation_post_process): MovingAverageMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
)
(weight_fake_quant): FakeQuantize(
fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8), scale=tensor([1.]), zero_point=tensor([0])
(activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
)
)
)
Why are the layers in classification_headers0 prepared for quantisation too?