Should I do the following if I implement quantization aware training?

I read this from the PyToch docs.
For static quantization techniques which quantize activations, the user needs to do the following in addition:

Specify where activations are quantized and de-quantized. This is done using QuantStub and DeQuantStub modules.

Use torch.nn.quantized.FloatFunctional to wrap tensor operations that require special handling for quantization into modules. Examples are operations like add and cat which require special handling to determine output quantization parameters.

Fuse modules: combine operations/modules into a single module to obtain higher accuracy and performance. This is done using the torch.quantization.fuse_modules() API, which takes in lists of modules to be fused. We currently support the following fusions: [Conv, Relu], [Conv, BatchNorm], [Conv, BatchNorm, Relu], [Linear, Relu]

Seems that these three operations are not needed for quantization aware training. But from the API example and another tutorial((beta) Static Quantization with Eager Mode in PyTorch — PyTorch Tutorials 1.7.1 documentation), these three operations are also inclueded. So should they be implemented?

Yes, all of these are used for QAT in Eager mode. Fusion is optional but recommended for higher performance and accuracy. Is there any information / doc stating otherwise?

Hi, @Vasiliy_Kuznetsov, since add and cat are common operations in CNN models, so every add or cat needs to be wrapped by torch.nn.quantized.FloatFunctional? And is there any other operation needs to do so? Only add and cat?

In Eager mode - yes. There is a full list of ops supported by FloatFunctional here: torch.nn.quantized — PyTorch 1.7.0 documentation.