Big diff between quantized model and model with fake_quants's

bigtree · November 17, 2021, 8:16pm

After the QAT is done I saved the model with fake_quant (torch.quantization.convert(model_fp32_prepared)) and the quantized model (after torch.quantization.convert(model_fp32_prepared))

when doing inference, I notice a big difference between these two models (see below pic, left:model with fake_quant, right: quantized model ).

Is there a way to reduce the diff?

HDCharles · November 17, 2021, 8:34pm

can you give a repro? the issue could be any number of things, without more context its impossible to tell.

bigtree · November 18, 2021, 6:35pm

Sorry. For some reason, I can’t share the repo.
One thing is suspected is I replaced all the activation functions with RELU.
Could that be an issue?

HDCharles · November 23, 2021, 5:14am

it certainly sounds like it, without a way to reproduce your result though its difficult to say much more.