Hi,
Dynamic quantization only helps in reducing the model size for models that use Linear and LSTM modules. For the case of resnet18, the model consists of conv layers which do not have dynamic quantization support yet. For your model, can you check if it has linear layers?
@raghuramank100
Thanks for your response.
But dynamic quantization is able to reduce model size from 553 MB to 182 MB, while VGG16 is mostly convolution layers, why such a drastic change then ?
print(qconfig)
config = BertConfig.from_json_file(bert_config_file)
print(āBuilding PyTorch model from configuration: {}ā.format(str(config)))
model = BertModel.from_pretrained("./distlangpytorch/")
In this mode of quantization the model has to be calibrated(evaluate your model after prepare()) to capture the qparams (zeropoint&scale)
Which are needed to quantize the model i.e weights and all
Does Graph Mode quantization suffers from this issue as well? I recently tried to quantize a jit saved model and did not see any difference, the model size is nearly the same, but the forward pass has gotten worse (nearly two times slower)
What could be the underlying issue here?