Where is the quantized param saved?

HDCharles · May 15, 2023, 8:59pm

see this tutorial:

https://pytorch.org/tutorials/advanced/dynamic_quantization_tutorial.html

which includes a method for looking at the size of the model.

in general the quantized weight is not simply saved as a quantized tensor with X elements each having Y bits, rather it has to be saved as packedparams which include other intermediate values needed by the quantized matmul to speed up quantization.