I am trying Dynamic Quantization and my model is mostly composed of linear layers. Hence is cant do any other quantization techniques. But with dynamic quantization of qint8 model size was reduced but the accuracy of model too. Is there any ways i can improve my accuracy ???
Static quantization supports Linear modules, if you’d like to try it out. Accuracy will inevitably drop, but it can be controlled by selectively quantizing some layers in your model (as opposed to all)