Unable to reproduce accuracy for quantized mobilenet_v2 model

As the title states, I’m trying to reproduce the results for the quantized mobilenet_v2 model trained with QAT. Pytorch reports an accuracy of 71.658% for the model (mobilenet_v2 — Torchvision main documentation). I used pytorch’s source for quantization vision/references/classification at release/0.13 · pytorch/vision · GitHub which states the mobilenetv2 model converges after 10 epochs. With the qnnpack backend in the first link, I get an accuracy of 69.2% and with an fbgemm model, I get an accuracy of 70.4%. My model converged after 6-7 epochs for both the backends but the accuracies are quite low compared to the reported accuracies.

Any ideas on what I should look at to bridge this gap?


Hi @Anirudh_Alameluvari,
can you provide your training hyperparameter, especially the batch size and number of GPU used for QAT?

Batch size used with qnnpack: 32, with fbgemm: 64. And number of GPUs is 1 in both the cases (A tesla K80).

Thanks for the information @Anirudh_Alameluvari

This are all my observations,
Quantization Parameters: I believe that you are using the same configurations.
Hyper Parameter: You need to change the learning according to the batch size. In the MobileNet-V2 under Experiments topic, they mentioned about this, please look into that and change the learning rate for the QAT. I am user that, the batch size will impact the accuracy of the model and it will increase the number of iterations and epoch to achieve the expected accuracy.

other then this Nothing from my end and If you find a concrete problem, please tag and educate me.
Thank you and Happy Learning!

Hi Anirudh,

I don’t think this is the script used for the model in the first link, but we have a tutorial for QAT on mobilenet_v2 here:


That I’ve run before and know should produce ~71.5% accuracy on FBGEMM. Could you try running that tutorial and seeing if you get expected results? It probably won’t be the same numbers as this model but it should be > 70.4%.

Hey @jcaip, thank you. Your suggestion worked. With fbgemm, I got an accuracy of 71.3%
qq. Are there any suggestions for speeding up the model? I profiled my code using pytorch profiler and it turns out most of the time was spent in the kernel which I’m guessing refers to the I/O. My GPU is at a near 100% utilization

Thank you @nkdatascientist, I’ll tweak the batch size and try next. For now, changing the backend improved the accuracy quite a bit

I think it’s the opposite actually. The kernel refers to GPU matmul ops, and if you were IO bound you would expect to see < 100% GPU utilization.

You can try fusing_modules, see here under QAT API Example, but I think qat_convert may do this already.

1 Like