Cant reproduce QAT precision

alanzhai219 · August 26, 2021, 1:45pm

I follow the tutorial of tutorials/static_quantization_tutorial.rst at master · pytorch/tutorials · GitHub
MobileNet-V2 baseline and PTQ work as expected but QAT top1 is only 67.88 after 8 epoch.
log here Unknown server log [#TRHsVfl] - mclo.gs

Thanks.

raghuramank100 · August 31, 2021, 1:09am

The tutorial shows an easy to run example, where you are using only a few batches of data for training (num_train_batches = 20). Also, the result of 71.5% is after training over 30 epochs on the full training dataset. If you are training on the full imagenet dataset, your results look ok. A full script to reproduce the training numbers is available at: vision/train_quantization.py at master · pytorch/vision · GitHub

alanzhai219 · August 31, 2021, 3:00am

@raghuramank100 Thanks for your kindly reply. Your answer is right. I am reproducing the accuracy as your suggested and result looks good.

Bug 30125 and comments provide some useful info. Believe it or not, QAT of static_quantization_tutorial.rst is NOT a good guide because it cant reproduce 71.5%.
I find some differences between static_quantization_tutorial.rst and train_quantization.py which seems to cause different accuracy. Actually, hyperparameter and training tricks make sense, right?
BTW, when and why to insert hyperparameter/freeze-operation/disable-operation is not easy during training. Any QAT trick or routine for other networks instead of mobilenet-v2, such as more classification or detection or segmentation? That is much helpful.

Thanks.