Dynamic Quantization not reducing model size

Raghav_Gurbaxani · October 30, 2019, 3:19am

Hi

I am trying to use the quantize_dynamic module, but the size of the quantized model is the same as the original model in my case.

import os
import torch
from craft import CRAFT
trained_model='craft_mlt_25k.pth'

def print_size_of_model(model):
    torch.save(model.state_dict(), "temp.p")
    print('Size (MB):', os.path.getsize("temp.p")/1e6)
    os.remove('temp.p')
    
from collections import OrderedDict
def copyStateDict(state_dict):
    if list(state_dict.keys())[0].startswith("module"):
        start_idx = 1
    else:
        start_idx = 0
    new_state_dict = OrderedDict()
    for k, v in state_dict.items():
        name = ".".join(k.split(".")[start_idx:])
        new_state_dict[name] = v
    return new_state_dict

net = CRAFT()     # initialize
net.load_state_dict(copyStateDict(torch.load(trained_model, map_location='cpu')))
net.eval()
print_size_of_model(net)
quantized = torch.quantization.quantize_dynamic(net, dtype=torch.qint8)
print_size_of_model(quantized)

Size of both original and quantized model is 83.14 MB. Why doesn’t the model size change ?
Any suggestions would be appreciated

Similarily when I tried to quantize the pretrained model ResNet18 model from torchvision, size only changed from 46 MB to 45 MB.

raghuramank100 · October 30, 2019, 7:26am

Hi,
Dynamic quantization only helps in reducing the model size for models that use Linear and LSTM modules. For the case of resnet18, the model consists of conv layers which do not have dynamic quantization support yet. For your model, can you check if it has linear layers?

Raghav_Gurbaxani · November 2, 2019, 6:27pm

@raghuramank100
Thanks for your response.
But dynamic quantization is able to reduce model size from 553 MB to 182 MB, while VGG16 is mostly convolution layers, why such a drastic change then ?

jerryzh168 · December 18, 2019, 12:39am

it’s probably because most of the weights are in the last few Linear layers?

raghuramank100 · December 19, 2019, 4:26am

Yes, for the case of VGG16, the last two fc layers contain the bulk of the weights.

Sagar_Gupta · June 2, 2020, 1:38pm

Hi,
I am trying to quantize a BERT but the size is not reducing, I wonder why, Can someone help me?

Here is the code snippet:-

import torch as torch

import os

from transformers.modeling_bert import BertConfig, BertForPreTraining, load_tf_weights_in_bert, BertModel

tf_checkpoint_path="./distlang/"
bert_config_file = “./config.json”
pytorch_dump_path="./distlangpytorch/"

device = “cpu”

torch.backends.quantized.engine = ‘qnnpack’

qconfig = torch.quantization.get_default_qconfig(‘fbgemm’)

print(qconfig)
config = BertConfig.from_json_file(bert_config_file)
print(“Building PyTorch model from configuration: {}”.format(str(config)))
model = BertModel.from_pretrained("./distlangpytorch/")

model.to(device)

torch.quantization.prepare(model)

quantized_model=torch.quantization.convert(model)

def print_size_of_model(model):
torch.save(model.state_dict(), “temp.p”)
print(‘Size (MB):’, os.path.getsize(“temp.p”)/1e6)
os.remove(‘temp.p’)

print_size_of_model(model)
print_size_of_model(quantized_model)

quantized_output_dir = “./quantized_model”
if not os.path.exists(quantized_output_dir):
os.makedirs(quantized_output_dir)

BOLLOJU_ARAVIND · June 3, 2020, 2:53am

Hi @Sagar_Gupta,

In this mode of quantization the model has to be calibrated(evaluate your model after prepare()) to capture the qparams (zeropoint&scale)
Which are needed to quantize the model i.e weights and all

deepak_mangla · August 19, 2020, 7:23am

Hi @BOLLOJU_ARAVIND, Can you refer some guide for doing this?

Shisho_Sama · October 14, 2020, 3:36am

Does Graph Mode quantization suffers from this issue as well? I recently tried to quantize a jit saved model and did not see any difference, the model size is nearly the same, but the forward pass has gotten worse (nearly two times slower)
What could be the underlying issue here?