Saving Quantized model

I have converted a fp32 model to 8bit model using post training static quantization. I tried to save the model using torch.save() and torch.jit.save() but both methods are not working. And then i tried to just save the state_dict, but then when i load it, the results are not consistent. Is there any other way to save a quantized model?

If you need any more info please let me know.

Thanks in advance.

loading/saving state_dict is the preferred method. Please save state_dict and then before loading it to a quantized model, make sure to follow the quantization steps, e.g., fusion. Also see Loading of Quantized Model

I did exactly what you told, but the results are different. I tried with fusing and without fusing, but it’s just not working. I can see that all the zero point and scales are same, all the weights are same, but the results are not same.

@flash87c could you share a small repro of what you did so that we can take a look?

hello, i meet the same question , have u solve it ?