Quantize stub module, before calibration, this is same as an observer, it will be swapped as nnq.Quantize in convert.
which unfortunately isn’t very helpful at all (which “observer”?). That last part of that sentence seems to suggest that at inference time this module will be replaced with one that does actual float-to-int8 data conversion. But what does this module do at calibration time?
furthermore, tutorials seem to suggest that QuantStub and DeQuantStub act as delimiters of parts of the model stack that will actually be subject to quantization; however, some other commentary ([quantization] how to quantize model which include not support to quantize layer - #2 by jerryzh168) seems to suggest that these also “record tensor statistics” and hence unique instances of them are needed – what, one unique pair per each contiguous quantization region?
QuantStub is just a place holder for quantize op, it needs to be unique since it has state.
DeQuantStub is a place holder for dequantize op, but it does not need to be unique since it’s stateless.
In eager mode quantization, users need to manually place QuantStub and DeQuantStub in the model whenever the activation in the code crosses the quantized and non-quantized boundary.
One thing to remember is for a quantized module, we always quantize the output of the module, but we don’t quantize the input of the module, so the quantization of the input Tensor should be taken care of by the previous module, that’s why we have QuantStub here, basically to quantize the input for the next quantized module in the sequence.
So in prepare, we’ll attach observer for the output of QuantStub to record the Tensor statistics of the output Tensor, just like for other modules like nn.Conv2d. observer is specified by the qconfig.
And in convertQuantStub will be swapped as nnq.Quantize module, and output of nnq.Quantize(input of nnq.Conv2d) will be quantized.
– that is the only mode available at the moment, correct? The JIT does not quantize (yet?).
This might sounds like a nit, but I think the following actually reflects a fundamental difficulty in quantizing in eager mode:
so the quantization of the input Tensor should be taken care of by the previous module,
How does pytorch decide what is “previous”? The true sequence of layer invocations is determined by the procedural code in forward() and it can involve branching at runtime or just data flow merging like with skip connections.
– that is the only mode available at the moment, correct? The JIT does not quantize (yet?).
yeah, eager mode is the only mode that’s supported in public release, but graph mode is coming up in 1.6 as well.
How does pytorch decide what is “previous”?
PyTorch don’t do this in eager mode. that’s why in eager mode users need to manually place QuantStub and DeQuantStub themselves. this is done automatically in graph mode quantization.
eager mode will just swap all modules that has a qconfig, so user need to make sure the swap makes sense and set qconfig and place QuantStub/DeQuantStub correctly.
when i do qat(quant awaring training), the tutorials said I should put QuantStub and DeQuantStub between self.modules within forward. so what did they do at this time, just meaningful at inference and valid time?
thank you very much.
so they just work during module convert, and do not do anything during qat training even after model.eval()? I change the quant config and then valid the module, the mAP is different
The purpose is to quant part of the module during qat, but when i execute prepare_qat, i see that all of the modules are inserted FakeQuantize. QuantStub and DeQuantStub do not do anything. Is there any way to insert FakeQuantize to part of the module?