Best practice or suggestion for QAT?

I am curious about disable_observer and freeze_bn_stats in quantization aware training. I don’t know when should I apply them. I have tried different combinations of two parameters. It seems that has a big impact on accuracy. Is there any best practice for quantization aware training? Like should I disable observer first and when should I disable it, train from scratch or fine-tune a trained model?

hi @eleflea, check out https://github.com/pytorch/vision/blob/master/references/classification/train_quantization.py for one example. One approach which has proven to work well is:

  • start QAT training from a floating point pre-trained model and with observers and fake_quant enabled
  • after a couple of epochs, freeze the BN stats if your network has any BNs (epoch == 3 in the example)
  • after a couple of epochs, disable observers (epoch == 4 in the example)

Thanks, I’ll try it.