Hello everyone. I recently use dynamic quantiztion to quant the model, when use torch.quantization.quantize_dynamic(model, dtype=torch.qint8) to quant the model, model from 39M to 30M, while use torch.quantization.quantize_dynamic(model, dtype=torch.float16) the model size has no changes. Does anybody know why? or Do I do the wrong way to quantize model to float16?
I’d appreciate if anybody can help me! Thanks in advance!