Save and load model.state_dict() successfully, but with the same input, the outputs ars differnt

hi, i save the quantization model with

torch.save(model.state_dict(),"quanted_model.pkl"

and load it with

model.fuse_model()
model.encoder.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(model.encoder, inplace=True)
torch.quantization.convert(model.encoder, inplace=True)
model.load_state_dict(torch.load(modelfile,map_location='cpu'))

the quantization step is same.

when i pass the calibrate step, i can get the same output from the load model, however , after i do the calibrate, the output is different betwen the reloaded model and the quantization model.

I compare the state_dict() and the metadata, but, they are same.

my model is transformer.

ok, this is because the new parameter of QuantizedLayerNorm didn’t save.

hello, crane. How do you quantize the layer_norm module. I try to use this code to convert the encoder-decoder transformer from fp32 to int8, but it seems the layer_norm is still fp32.

from torch.quantization import QuantStub, DeQuantStub, float_qparams_weight_only_qconfig, default_qconfig

eager mode

backend = “fbgemm”
#model.qconfig = torch.quantization.get_default_qconfig(backend)
model.qconfig = default_qconfig
model.encoder.embeddings.qconfig = float_qparams_weight_only_qconfig
model.decoder.embeddings.qconfig = float_qparams_weight_only_qconfig

model_fp32_prepared = torch.quantization.prepare(model)
model_int8 = torch.quantization.convert(model_fp32_prepared)

And when I try to inference the model, it raise error about Could not run 'quantized::layer_norm' with arguments from the 'CPU' backend.