I have converted a fp32 model to 8bit model using post training static quantization. I tried to save the model using torch.save() and torch.jit.save() but both methods are not working. And then i tried to just save the state_dict, but then when i load it, the results are not consistent. Is there any other way to save a quantized model?
loading/saving state_dict is the preferred method. Please save state_dict and then before loading it to a quantized model, make sure to follow the quantization steps, e.g., fusion. Also see Loading of Quantized Model
I did exactly what you told, but the results are different. I tried with fusing and without fusing, but itβs just not working. I can see that all the zero point and scales are same, all the weights are same, but the results are not same.