Quantization not Decreasing Model Size (Static and QAT)

Hi

I am trying to quantize a text detection model based on Mobilenet (model definition here )

After inserting the quant and dequant stub, fusing all the conv+bn+relu and conv+relu, replacing cat with skip_add.cat() . I perform the static quantization (script - https://github.com/raghavgurbaxani/Quantization_Experiments/blob/master/try_quantization.py )

After performing quantization, the model size doesn’t go down (in fact it increases )

Original Size:
Size (MB): 6.623636

Fused model Size:
Size (MB): 6.638188

Quantized model Size:
Size (MB): 7.928258

I have even printed the final quantized model here

I changed the qconfig to fused_model.qconfig = torch.quantization.default_qconfig but still quantized_model size is Size (MB): 6.715115

Why doesn’t the model size reduce ?

1 Like

Looking at the model def you posted, it looks like it is not yet quantized. One missing thing is calibration. You can add a calibration step after you call prepare and before you call convert:

torch.quantization.prepare(fused_model, inplace=True)

# calibrate your model by feeding it example inputs
for inputs in your_dataset:        
    fused_model(inputs)

print('Quantized model Size:')
quantized = torch.quantization.convert(fused_model, inplace=False)
print_size_of_model(quantized)

Hi @Vasiliy_Kuznetsov

Thank you for your input, I have updated my script to pass in a few images into the fused model as inputs for calibration.

Please see the updated script here

But still the quantized model size is bigger than the original model -

Original Size:
Size (MB): 6.623636

Fused model Size:
Size (MB): 6.638188

Quantized model Size:
Size (MB): 6.712286

there seems to be some improvement due to the calibration, but the quantized model size is still not satisfactory compared to the original size :frowning:

Could you suggest what’s going wrong here ?

@Vasiliy_Kuznetsov

I also tried a script with Quantized Aware Training -

But still the quantized model is bigger than the original model :frowning: :no_mouth:

I don’t know what’s going wrong here

Original Size:
Size (MB): 6.623636

Fused model Size:
Size (MB): 6.638188

Quantized model Size:
Size (MB): 6.712286

QAT model Size:
Size (MB): 6.712286

in the paste here (https://github.com/raghavgurbaxani/Quantization_Experiments/blob/master/quantized_model.txt), the model doesn’t look quantized. One would expect to see QuantizedConv instead of Conv and QuantizedLinear instead of Linear. One thing to try could be to make sure to run the convert script and ensure that you see the quantized module equivalents afterwards.

Hi @Vasiliy_Kuznetsov

Please check the updated quantized_model now -

it seems to have quantized covolutions (line 100 onwards). I don’t know why the layers before line 100 do not have quantized modules.

Do you think my quantstub and dequantstub placement is incorrect ?
Here’s the model (with quant and dequant stub)

Main script here - https://github.com/raghavgurbaxani/Quantization_Experiments/blob/master/try_qat.py

I suspect maybe my quant and dequant stub may be incorrect but apart from that I’ve followed all the steps as posted in the static quantization tutorial.

Reallly appreciate your help

@Vasiliy_Kuznetsov any update on this ?

Hi Raghav,

For post training quantization, we want the model to be in eval mode (see https://github.com/pytorch/pytorch/blob/530d48e93a3f04a5ec63a1b789c19a5f775bf497/torch/quantization/fuse_modules.py#L63). So, you can add a model.eval() call before you fuse modules:

model.eval()
torch.quantization.fuse_modules(...)
1 Like

Hey @Vasiliy_Kuznetsov! I am also experiencing a similar error, but only when quantising torch.nn.GRU with this script:

import torch.nn as nn
from torch.ao.quantization.qconfig_mapping import QConfigMapping
import torch.quantization.quantize_fx as quantize_fx
import copy

class UserModule(nn.Module):
    def __init__(self):
        super().__init__()
        self.l = nn.GRU(128, 128, 128, batch_first=True, bidirectional=True)
    
    def forward(self, x):
        return self.l(x)

model_fp = UserModule()

model_to_quantize = copy.deepcopy(model_fp)
model_to_quantize.eval()
qconfig_mapping = QConfigMapping().set_global(torch.quantization.default_dynamic_qconfig)
# a tuple of one or more example inputs are needed to trace the model

model_prepared = quantize_fx.prepare_fx(model_to_quantize, qconfig_mapping, None)
model_quantized = quantize_fx.convert_fx(model_prepared)

def print_size_of_model(model):
    import os
    torch.save(model.state_dict(), "temp.p")
    print('Size (MB):', os.path.getsize("temp.p")/1e6)
    os.remove('temp.p')

print_size_of_model(model_fp)
print_size_of_model(model_quantized)

The exact same script, but with

self.l = nn.GRU(128, 128, 128, batch_first=True, bidirectional=True)

works like a charm.

I am using torch 1.13, and, as listed here nn.GRU should be supported with dynamic quantization.

Additional info that might be useful, I am able to quantise GRUCell, RNNCell and LSTMCell properly.

Any idea why the model is not getting quantised?

Thanks!

Cheers,
Francesco.

Hi @fpaissan , sorry for the late reply.

It looks like the FX graph mode quantization script does not include the dynamic quantization configuration for torch.nn.GRU, I filed FX graph mode quant: backendconfig configuration missing for torch.nn.GRU · Issue #90394 · pytorch/pytorch · GitHub to track this. Our team can fix this.

@fpaissan , as a workaround you could try using the Eager mode torch.ao.quantization.quantize_dynamic API, which should support torch.nn.GRU.