Hello. I am struggling to assess the storage memory footprint of quantized models. I would like to compare the storage savings of using quantization. We theoretically can manage to know this, but I would also like to save my model for deployment.
So, basically, my question is, how can I store a Pytorch quantized model (not quantized using the built-in quantization methods) encoded as to minimize its size in accordance with the bitwidth used?