I read this from the PyToch docs.
For static quantization techniques which quantize activations, the user needs to do the following in addition:
Specify where activations are quantized and de-quantized. This is done using QuantStub and DeQuantStub modules.
Use torch.nn.quantized.FloatFunctional to wrap tensor operations that require special handling for quantization into modules. Examples are operations like add and cat which require special handling to determine output quantization parameters.
Fuse modules: combine operations/modules into a single module to obtain higher accuracy and performance. This is done using the torch.quantization.fuse_modules() API, which takes in lists of modules to be fused. We currently support the following fusions: [Conv, Relu], [Conv, BatchNorm], [Conv, BatchNorm, Relu], [Linear, Relu]
`
Seems that these three operations are not needed for quantization aware training. But from the API example and another tutorial((beta) Static Quantization with Eager Mode in PyTorch — PyTorch Tutorials 1.7.1 documentation), these three operations are also inclueded. So should they be implemented?