I have scoured all documentation I could locate and am still confused on certain accounts:
- class docstring (https://pytorch.org/docs/stable/quantization.html#torch.quantization.QuantStub) says
Quantize stub module, before calibration, this is same as an observer, it will be swapped as nnq.Quantize in convert.
which unfortunately isn’t very helpful at all (which “observer”?). That last part of that sentence seems to suggest that at inference time this module will be replaced with one that does actual float-to-int8 data conversion. But what does this module do at calibration time?
- furthermore, tutorials seem to suggest that QuantStub and DeQuantStub act as delimiters of parts of the model stack that will actually be subject to quantization; however, some other commentary ([quantization] how to quantize model which include not support to quantize layer) seems to suggest that these also “record tensor statistics” and hence unique instances of them are needed – what, one unique pair per each contiguous quantization region?
Some more details would be very much appreciated.