The document about quantization suggests that the Conv1D/2D/3D layers does not support dynamic quantization. While GRU and LSTM supports only dynamic quantization. So what is the best way to quantize models with such architecture ?
Hi @Hari_Krishnan ,
For now, what you mentioned above is correct. In the future, there may be dynamic quantization of conv or static quantization of parts of GRU, but that does not exist at the moment.
You could try quantizing the convolutions statically and your GRU/LSTM layers dynamically. If you are using FX graph mode quantization, you can specify which layers to quantize in which way and the framework will model the dtype transitions for you. If you are using eager mode quantization, you’d have to model the dtype transitions yourself.