I want to quantize my trained model 'model.pth'. model size 188mb

I just trained my model using pytorch which has the size of 188mb, its very bad to run on real time inference , I want to reduce the size of my model I know it can be done by quantization but I tried & unable to quantize my trained model , there are many example to quantize during training but not available after you train your own model .

can you please give some code to easily quantize my heavy model ?

Here is the link to my colab with code snippets for applying post-training static quantization to a custom model and torchvision pre-trained model.
Post Training Static Quantization on Pre-trained Torchvision Models

Hope this helps.