when i pass the calibrate step, i can get the same output from the load model, however , after i do the calibrate, the output is different betwen the reloaded model and the quantization model.
I compare the state_dict() and the metadata, but, they are same.
hello, crane. How do you quantize the layer_norm module. I try to use this code to convert the encoder-decoder transformer from fp32 to int8, but it seems the layer_norm is still fp32.
from torch.quantization import QuantStub, DeQuantStub, float_qparams_weight_only_qconfig, default_qconfig