After the QAT is done I saved the model with fake_quant (torch.quantization.convert(model_fp32_prepared)) and the quantized model (after torch.quantization.convert(model_fp32_prepared))
when doing inference, I notice a big difference between these two models (see below pic, left:model with fake_quant, right: quantized model ).