I just trained my model using pytorch which has the size of 188mb, its very bad to run on real time inference , I want to reduce the size of my model I know it can be done by quantization but I tried & unable to quantize my trained model , there are many example to quantize during training but not available after you train your own model .
can you please give some code to easily quantize my heavy model ?