Static quantization inference

FengMu1995 · December 13, 2021, 3:16am

if my model consists of deeper and nested layers, should I insert quant&dequant into every layer?

supriyar · December 13, 2021, 6:17pm

If all the operators in the model can be quantized, you can insert a quant and dequant at the beginning and end of the model. See (beta) Static Quantization with Eager Mode in PyTorch — PyTorch Tutorials 1.10.0+cu102 documentation for more details.

If there are only specific layers that can be quantized then you would have to wrap them individually using quant-dequant blocks in eager mode. Alternatively, you could try the FX graph mode quantization flow which should automate the process - (prototype) FX Graph Mode Post Training Static Quantization — PyTorch Tutorials 1.10.0+cu102 documentation