What do [De]QuantStub actually do?

vladium · May 22, 2020, 12:44am

I have scoured all documentation I could locate and am still confused on certain accounts:

class docstring (Quantization — PyTorch 2.1 documentation) says

Quantize stub module, before calibration, this is same as an observer, it will be swapped as nnq.Quantize in convert.

which unfortunately isn’t very helpful at all (which “observer”?). That last part of that sentence seems to suggest that at inference time this module will be replaced with one that does actual float-to-int8 data conversion. But what does this module do at calibration time?

furthermore, tutorials seem to suggest that QuantStub and DeQuantStub act as delimiters of parts of the model stack that will actually be subject to quantization; however, some other commentary ([quantization] how to quantize model which include not support to quantize layer - #2 by jerryzh168) seems to suggest that these also “record tensor statistics” and hence unique instances of them are needed – what, one unique pair per each contiguous quantization region?

Some more details would be very much appreciated.

jerryzh168 · May 22, 2020, 1:14am

QuantStub is just a place holder for quantize op, it needs to be unique since it has state.
DeQuantStub is a place holder for dequantize op, but it does not need to be unique since it’s stateless.

In eager mode quantization, users need to manually place QuantStub and DeQuantStub in the model whenever the activation in the code crosses the quantized and non-quantized boundary.

One thing to remember is for a quantized module, we always quantize the output of the module, but we don’t quantize the input of the module, so the quantization of the input Tensor should be taken care of by the previous module, that’s why we have QuantStub here, basically to quantize the input for the next quantized module in the sequence.

So in prepare, we’ll attach observer for the output of QuantStub to record the Tensor statistics of the output Tensor, just like for other modules like nn.Conv2d. observer is specified by the qconfig.

And in convert QuantStub will be swapped as nnq.Quantize module, and output of nnq.Quantize(input of nnq.Conv2d) will be quantized.

vladium · May 22, 2020, 2:58pm

Thank you, Jerry, this helps.

To clarify

In eager mode quantization

– that is the only mode available at the moment, correct? The JIT does not quantize (yet?).

This might sounds like a nit, but I think the following actually reflects a fundamental difficulty in quantizing in eager mode:

so the quantization of the input Tensor should be taken care of by the previous module,

How does pytorch decide what is “previous”? The true sequence of layer invocations is determined by the procedural code in forward() and it can involve branching at runtime or just data flow merging like with skip connections.

(I am yet to succeed in quantizing resnet because of this, I suspect)

jerryzh168 · June 23, 2020, 4:55pm

– that is the only mode available at the moment, correct? The JIT does not quantize (yet?).

yeah, eager mode is the only mode that’s supported in public release, but graph mode is coming up in 1.6 as well.

How does pytorch decide what is “previous”?

PyTorch don’t do this in eager mode. that’s why in eager mode users need to manually place QuantStub and DeQuantStub themselves. this is done automatically in graph mode quantization.
eager mode will just swap all modules that has a qconfig, so user need to make sure the swap makes sense and set qconfig and place QuantStub/DeQuantStub correctly.

sumagic · September 19, 2023, 9:28am

when i do qat(quant awaring training), the tutorials said I should put QuantStub and DeQuantStub between self.modules within forward. so what did they do at this time, just meaningful at inference and valid time?

jerryzh168 · September 19, 2023, 8:52pm

yeah in training it’s a no-op, but after convert we replace it with a dequantize op

sumagic · September 20, 2023, 1:13am

thank you very much.
so they just work during module convert, and do not do anything during qat training even after model.eval()? I change the quant config and then valid the module, the mAP is different

sumagic · September 20, 2023, 1:20am

The purpose is to quant part of the module during qat, but when i execute prepare_qat, i see that all of the modules are inserted FakeQuantize. QuantStub and DeQuantStub do not do anything. Is there any way to insert FakeQuantize to part of the module?

jerryzh168 · September 21, 2023, 10:46pm

yeah you’ll need to configure qconfig correctly.

eager mode is hard to use in general, if your model works with torch.export it might be easier to try out our new flows: (prototype) PyTorch 2.0 Export Post Training Static Quantization — PyTorch Tutorials 2.0.1+cu117 documentation and How to Write a Quantizer for PyTorch 2 Export Quantization — PyTorch Tutorials 2.1.1+cu121 documentation