QuantStub/DeQuantStubs for QAT confusion

Hello,

I’m kind of confused by the use of Quant/DeQuantStubs for Quantization Aware Training.

From my understanding only layers in between the Quant/DequantStubs are supposed to be quantised (is that correct?) but for my model when I place quantstubs around just the backbone:

x = self.quant0(x)
x = self.backbone0(x)
x = self.dequant(x)
confidence = self.classification_headers0(x)

and I look at the layers in classification_headers before and after preparation:

print(model.classification_headers0)
model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')
torch.quantization.prepare_qat(model, inplace=True)
print(model.classification_headers0)

I get

Sequential(
  (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64)
  (1): ReLU()
  (2): Conv2d(64, 6, kernel_size=(1, 1), stride=(1, 1))
)
Sequential(
  (0): Conv2d(
    64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
    (activation_post_process): FakeQuantize(
      fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8),            scale=tensor([1.]), zero_point=tensor([0])
      (activation_post_process): MovingAverageMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
    )
    (weight_fake_quant): FakeQuantize(
      fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8),            scale=tensor([1.]), zero_point=tensor([0])
      (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
    )
  )
  (1): ReLU(
    (activation_post_process): FakeQuantize(
      fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8),            scale=tensor([1.]), zero_point=tensor([0])
      (activation_post_process): MovingAverageMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
    )
  )
  (2): Conv2d(
    64, 6, kernel_size=(1, 1), stride=(1, 1)
    (activation_post_process): FakeQuantize(
      fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8),            scale=tensor([1.]), zero_point=tensor([0])
      (activation_post_process): MovingAverageMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
    )
    (weight_fake_quant): FakeQuantize(
      fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8),            scale=tensor([1.]), zero_point=tensor([0])
      (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))
    )
  )
)

Why are the layers in classification_headers0 prepared for quantisation too?

Which layers to quantize is controlled by the qconfig, when we do model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm') it applies the defaults settings of which modules to swap to the entire model.

the mapping argument to prepare_qat (https://github.com/pytorch/pytorch/blob/733b8c23c436d906125c20f0a64692bf57bce040/torch/quantization/quantize.py#L289) can be used to customize which layer’s you’d like to quantize