Saving Quantized model

flash87c · December 3, 2020, 1:56pm

I have converted a fp32 model to 8bit model using post training static quantization. I tried to save the model using torch.save() and torch.jit.save() but both methods are not working. And then i tried to just save the state_dict, but then when i load it, the results are not consistent. Is there any other way to save a quantized model?

If you need any more info please let me know.

Thanks in advance.

dskhudia · December 3, 2020, 8:08pm

loading/saving state_dict is the preferred method. Please save state_dict and then before loading it to a quantized model, make sure to follow the quantization steps, e.g., fusion. Also see Loading of Quantized Model

flash87c · December 7, 2020, 3:14pm

I did exactly what you told, but the results are different. I tried with fusing and without fusing, but it’s just not working. I can see that all the zero point and scales are same, all the weights are same, but the results are not same.

hx89 · December 11, 2020, 6:36pm

@flash87c could you share a small repro of what you did so that we can take a look?

crane · June 24, 2021, 6:21am

hello, i meet the same question , have u solve it ?

Parameswar · August 18, 2021, 4:31pm

Hello,
I am facing this same issue, did you find solution for that?
Thanks.

jerryzh168 · October 1, 2021, 6:41am

cc @Vasiliy_Kuznetsov have we solve the serialization issue? maybe we can make a post here if that is the case

yyl-github-1896 · October 6, 2021, 8:14am

You can use API torch.jit.save() to save quantized models. Just as what has been done in PyTorch Quantization Tutorial. (beta) Static Quantization with Eager Mode in PyTorch — PyTorch Tutorials 1.9.1+cu102 documentation