Hi, I am using the dynamic quantization on my model, and trying to compute the size reduced. I suppose the number of parameters before and after quantization should be the same, only the number of bits used is changed. However, when I check the model state_dict or parameters, the quantized is not there, and it is not in the buffer. I was trying to estimate the model size by numel * element_size, but it seems the quantized param cannot be found. I saw the document for pruning that I can compute the number of pruned params based on zeros in the _mask in the buffer, but I cannot find docs about how the quantized param is saved. Could anyone please help here, thanks.

see this tutorial:

https://pytorch.org/tutorials/advanced/dynamic_quantization_tutorial.html

which includes a method for looking at the size of the model.

in general the quantized weight is not simply saved as a quantized tensor with X elements each having Y bits, rather it has to be saved as packedparams which include other intermediate values needed by the quantized matmul to speed up quantization.