Big diff between quantized model and model with fake_quants's

After the QAT is done I saved the model with fake_quant (torch.quantization.convert(model_fp32_prepared)) and the quantized model (after torch.quantization.convert(model_fp32_prepared))

when doing inference, I notice a big difference between these two models (see below pic, left:model with fake_quant, right: quantized model ).
image
Is there a way to reduce the diff?

can you give a repro? the issue could be any number of things, without more context its impossible to tell.

Sorry. For some reason, I can’t share the repo.
One thing is suspected is I replaced all the activation functions with RELU.
Could that be an issue?

it certainly sounds like it, without a way to reproduce your result though its difficult to say much more.