How do I save and load quantization model

I have quantized resenet50, quntize_per_channel_resent50 model is giving good accuracy same as floating-point. If I do torch jit save then I can load torch jit load. and do the inference.

How can I use a torch.save and torch.load model on a quantized model?
Will the entire state dict have same scale and zero points?
How can I get each layer scale and zero points from the quantized model?

1 Like

How can I use a torch.save and torch.load model on a quantized model?

Currently we only support torch.save(model.state_dict()) and model.load_state_dict(…) I think. torch.save/torch.load model directly is not yet supported I believe.

Will the entire state dict have same scale and zero points?

No, they’ll have scale/zero_point that’s calculated from the calibration step.

How can I get each layer scale and zero points from the quantized model?

you can print the quantized model and it will show scale and zero_point, e.g.:

> print(torch.nn.quantized.Conv2d(3, 3, 3))
QuantizedConv2d(3, 3, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0)
1 Like

Thank you @jerryzh168

I was able to save with model.state_dict() but not able to lad the model with same model.load_state_dict(). It was giving keyError.

Secondly if I save with torch.jit.save(torch.jit.script(pcqmodel),“quantization_per_channel_model.pth”)

I am not able to see the Quantization info after loading the model . Referred in this issue

are you using the most recent version? could you try again with PyTorch nightly builds?

Also, check if it is just the __repr__ that is not showing the info or are the quant params really missing – try getting the scale and zero_point directly.

Be sure you do the whole post training preparation process (by running layer fusion, torch.quantization.prepare() and torch.quantization.convert() ) before loading the state_dict.

3 Likes

Has this been fixed? I’m unable to save and load quantized models even after following all the steps.

Has this been fixed? I’m unable to save and load quantized models even after following all the steps.

do you have a reproducible example on a toy model?