How is quantization of activations handled in pytorch after QAT?

HDCharles · March 1, 2022, 9:10pm

nothing. there’s no quantized hardtanh op because its an elementwise operation whose range is determined by the input and it works equally as well on quantized and nonquantized tensors.

>>> ht=torch.nn.modules.activation.Hardtanh()
>>> x=torch.randn(3,3)
>>> xq=torch.quantize_per_tensor(b, 1.0, 0, torch.quint8)
>>> ht(xq)
tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 1.]], size=(3, 3), dtype=torch.quint8,
       quantization_scheme=torch.per_tensor_affine, scale=1.0, zero_point=0)

its essentially the same as torch.clamp(X, -1,1)
you don’t need output qparams for that.

I assume you are using fx quantization since hardtanh doesn’t even show up in the eager_mode quantization mappings so it wouldn’t recieve an observer there.

edit to answer your specific question about whether the diagram is correct, are you asking about how it works in qat or once converted? for the converted model, the output is never an fp32, the scale/zp are inputs to the quantized kernel and the output is already quantized. Also nothing is ever int8, its qint8 or quint8. In qat nothing is in int8/qint8/quint8, everything happens in fp32 but goes through fakequant ops that simulate the conversion but leave the value in fp32. In that case the output does come out as an fp32 and then goes into another fakequant with a scale and zeropoint specified.