I read the quantization paper in pytorch website and realize the post dynamic quantization and static quantization.dynamic quantization is good for LSTM and Linear, and static quantization is good for CNNs, I wanna ask: when I use the CRNN model,the model is like: CNN + LSTM + Linear, what is the best way to quantize my model, or is there some tricks to mix the two quantization methods?
I’d appreciate if anybody can help me! Thanks in advance!
I think it’s possible, you may apply static quantization to the CNN part of the model and dynamic quantization on LSTM + Linear part of the model, since both of them will have float data in the input and output, the combined model should work.
1.fix rnn and linear layers, quantize cnn layers (post-training static quantization)
2.fix rnn and linear layers, quantize cnn layers (quantization-aware training, this step is optional)
3.fix quantized cnn layers, quantize rnn and linear layers(post-training dynamic quantization)
Quantization is controlled by the qconfig, so when quantize cnn layers you can remove the qconfig of rnn layer, this way rnn layer will not be quantized.