If there is a model with CNN as backbone, LSTM as its head, how to quantize this whole model with post training quantization? It seems we can apply static quantization to CNN and dynamic quantization to LSTM( Quantization — PyTorch 1.12 documentation). But not very sure how to deal with cases like above one.
How can I quantize the whole model if I have ResNet Blocks followed by LSTM layer? When I did not quantize the LSTM, the accuracy of the model was halved, and when I only quantized the LSTM by PTDQ, the acceleration of the model was negligible.
generally quantizing more modules isn’t going to improve the accuracy, if NOT quantizing the LSTMs is less accurate something is going very wrong. I would take a deeper look at it because it should be close to impossible for that to happen.