What do [De]QuantStub actually do?

I have scoured all documentation I could locate and am still confused on certain accounts:

Quantize stub module, before calibration, this is same as an observer, it will be swapped as nnq.Quantize in convert.

which unfortunately isn’t very helpful at all (which “observer”?). That last part of that sentence seems to suggest that at inference time this module will be replaced with one that does actual float-to-int8 data conversion. But what does this module do at calibration time?

  • furthermore, tutorials seem to suggest that QuantStub and DeQuantStub act as delimiters of parts of the model stack that will actually be subject to quantization; however, some other commentary ([quantization] how to quantize model which include not support to quantize layer) seems to suggest that these also “record tensor statistics” and hence unique instances of them are needed – what, one unique pair per each contiguous quantization region?

Some more details would be very much appreciated.

QuantStub is just a place holder for quantize op, it needs to be unique since it has state.
DeQuantStub is a place holder for dequantize op, but it does not need to be unique since it’s stateless.

In eager mode quantization, users need to manually place QuantStub and DeQuantStub in the model whenever the activation in the code crosses the quantized and non-quantized boundary.

One thing to remember is for a quantized module, we always quantize the output of the module, but we don’t quantize the input of the module, so the quantization of the input Tensor should be taken care of by the previous module, that’s why we have QuantStub here, basically to quantize the input for the next quantized module in the sequence.

So in prepare, we’ll attach observer for the output of QuantStub to record the Tensor statistics of the output Tensor, just like for other modules like nn.Conv2d. observer is specified by the qconfig.

And in convert QuantStub will be swapped as nnq.Quantize module, and output of nnq.Quantize(input of nnq.Conv2d) will be quantized.

Thank you, Jerry, this helps.

To clarify

In eager mode quantization

– that is the only mode available at the moment, correct? The JIT does not quantize (yet?).

This might sounds like a nit, but I think the following actually reflects a fundamental difficulty in quantizing in eager mode:

so the quantization of the input Tensor should be taken care of by the previous module,

How does pytorch decide what is “previous”? The true sequence of layer invocations is determined by the procedural code in forward() and it can involve branching at runtime or just data flow merging like with skip connections.

(I am yet to succeed in quantizing resnet because of this, I suspect)

– that is the only mode available at the moment, correct? The JIT does not quantize (yet?).

yeah, eager mode is the only mode that’s supported in public release, but graph mode is coming up in 1.6 as well.

How does pytorch decide what is “previous”?

PyTorch don’t do this in eager mode. that’s why in eager mode users need to manually place QuantStub and DeQuantStub themselves. this is done automatically in graph mode quantization.
eager mode will just swap all modules that has a qconfig, so user need to make sure the swap makes sense and set qconfig and place QuantStub/DeQuantStub correctly.