Saving Quantized model

I have converted a fp32 model to 8bit model using post training static quantization. I tried to save the model using torch.save() and torch.jit.save() but both methods are not working. And then i tried to just save the state_dict, but then when i load it, the results are not consistent. Is there any other way to save a quantized model?

If you need any more info please let me know.

Thanks in advance.

loading/saving state_dict is the preferred method. Please save state_dict and then before loading it to a quantized model, make sure to follow the quantization steps, e.g., fusion. Also see Loading of Quantized Model

I did exactly what you told, but the results are different. I tried with fusing and without fusing, but it’s just not working. I can see that all the zero point and scales are same, all the weights are same, but the results are not same.

@flash87c could you share a small repro of what you did so that we can take a look?

hello, i meet the same question , have u solve it ?

1 Like

Hello,
I am facing this same issue, did you find solution for that?
Thanks.

cc @Vasiliy_Kuznetsov have we solve the serialization issue? maybe we can make a post here if that is the case

You can use API torch.jit.save() to save quantized models. Just as what has been done in PyTorch Quantization Tutorial. (beta) Static Quantization with Eager Mode in PyTorch β€” PyTorch Tutorials 1.9.1+cu102 documentation