Quantization not Decreasing Model Size (Static and QAT)

Raghav_Gurbaxani · June 29, 2020, 3:50pm

Hi

I am trying to quantize a text detection model based on Mobilenet (model definition here )

After inserting the quant and dequant stub, fusing all the conv+bn+relu and conv+relu, replacing cat with skip_add.cat() . I perform the static quantization (script - https://github.com/raghavgurbaxani/Quantization_Experiments/blob/master/try_quantization.py )

After performing quantization, the model size doesn’t go down (in fact it increases )

Original Size:
Size (MB): 6.623636

Fused model Size:
Size (MB): 6.638188

Quantized model Size:
Size (MB): 7.928258

I have even printed the final quantized model here

I changed the qconfig to fused_model.qconfig = torch.quantization.default_qconfig but still quantized_model size is Size (MB): 6.715115

Why doesn’t the model size reduce ?

Vasiliy_Kuznetsov · June 29, 2020, 5:01pm

Looking at the model def you posted, it looks like it is not yet quantized. One missing thing is calibration. You can add a calibration step after you call prepare and before you call convert:

torch.quantization.prepare(fused_model, inplace=True)

# calibrate your model by feeding it example inputs
for inputs in your_dataset:        
    fused_model(inputs)

print('Quantized model Size:')
quantized = torch.quantization.convert(fused_model, inplace=False)
print_size_of_model(quantized)

Raghav_Gurbaxani · June 29, 2020, 10:38pm

Hi @Vasiliy_Kuznetsov

Thank you for your input, I have updated my script to pass in a few images into the fused model as inputs for calibration.

Please see the updated script here

github.com

raghavgurbaxani/Quantization_Experiments/blob/master/try_quantization.py

import os
import config as cfg
from model import East
import torch
import utils
import preprossing
import cv2
import numpy as np
import time

def uninplace(model):
    if hasattr(model, 'inplace'):
        model.inplace = False
    if not model.children():
        return
    for child in model.children():
        uninplace(child)
        
def print_size_of_model(model):
    torch.save(model.state_dict(), "temp.p")

This file has been truncated. show original

But still the quantized model size is bigger than the original model -

Original Size:
Size (MB): 6.623636

Fused model Size:
Size (MB): 6.638188

Quantized model Size:
Size (MB): 6.712286

there seems to be some improvement due to the calibration, but the quantized model size is still not satisfactory compared to the original size

Could you suggest what’s going wrong here ?

Raghav_Gurbaxani · June 30, 2020, 1:30am

@Vasiliy_Kuznetsov

I also tried a script with Quantized Aware Training -

github.com

raghavgurbaxani/Quantization_Experiments/blob/master/try_qat.py

import os
import config as cfg
from model import East
import torch
import utils
import preprossing
import cv2
import numpy as np
import time
import loss

def uninplace(model):
    if hasattr(model, 'inplace'):
        model.inplace = False
    if not model.children():
        return
    for child in model.children():
        uninplace(child)
        
def print_size_of_model(model):

This file has been truncated. show original

But still the quantized model is bigger than the original model

I don’t know what’s going wrong here

Original Size:
Size (MB): 6.623636

Fused model Size:
Size (MB): 6.638188

Quantized model Size:
Size (MB): 6.712286

QAT model Size:
Size (MB): 6.712286

Vasiliy_Kuznetsov · June 30, 2020, 3:48pm

in the paste here (https://github.com/raghavgurbaxani/Quantization_Experiments/blob/master/quantized_model.txt), the model doesn’t look quantized. One would expect to see QuantizedConv instead of Conv and QuantizedLinear instead of Linear. One thing to try could be to make sure to run the convert script and ensure that you see the quantized module equivalents afterwards.

Raghav_Gurbaxani · June 30, 2020, 4:43pm

Hi @Vasiliy_Kuznetsov

Please check the updated quantized_model now -

github.com

raghavgurbaxani/Quantization_Experiments/blob/master/quantized_model.txt

Size (MB): 6.712286
DataParallel(
  (module): East(
    (mobilenet): MobileNetV2(
      (features): Sequential(
        (0): Sequential(
          (0): ConvBnReLU2d(
            (0): Conv2d(
              3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
              (activation_post_process): MinMaxObserver(min_val=-694.3411254882812, max_val=765.30712890625)
            )
            (1): BatchNorm2d(
              32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
              (activation_post_process): MinMaxObserver(min_val=-4.2157487869262695, max_val=4.755300998687744)
            )
            (2): ReLU(
              (activation_post_process): MinMaxObserver(min_val=0.0, max_val=4.755300998687744)
            )
          )
          (1): Identity()

This file has been truncated. show original

it seems to have quantized covolutions (line 100 onwards). I don’t know why the layers before line 100 do not have quantized modules.

Do you think my quantstub and dequantstub placement is incorrect ?
Here’s the model (with quant and dequant stub)

github.com

raghavgurbaxani/Quantization_Experiments/blob/master/model.py

import torch.nn as nn
import math
import torch
import config as cfg
import utils
from torch.quantization import QuantStub, DeQuantStub

def conv_bn(inp, oup, stride):
    return nn.Sequential(
        nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
        nn.BatchNorm2d(oup),
        nn.ReLU(inplace=True)
    )


class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

This file has been truncated. show original

Main script here - https://github.com/raghavgurbaxani/Quantization_Experiments/blob/master/try_qat.py

I suspect maybe my quant and dequant stub may be incorrect but apart from that I’ve followed all the steps as posted in the static quantization tutorial.

Reallly appreciate your help

Raghav_Gurbaxani · July 2, 2020, 3:33pm

@Vasiliy_Kuznetsov any update on this ?

Vasiliy_Kuznetsov · July 6, 2020, 9:17pm

Hi Raghav,

For post training quantization, we want the model to be in eval mode (see https://github.com/pytorch/pytorch/blob/530d48e93a3f04a5ec63a1b789c19a5f775bf497/torch/quantization/fuse_modules.py#L63). So, you can add a model.eval() call before you fuse modules:

model.eval()
torch.quantization.fuse_modules(...)

fpaissan · November 29, 2022, 4:50pm

Hey @Vasiliy_Kuznetsov! I am also experiencing a similar error, but only when quantising torch.nn.GRU with this script:

import torch.nn as nn
from torch.ao.quantization.qconfig_mapping import QConfigMapping
import torch.quantization.quantize_fx as quantize_fx
import copy

class UserModule(nn.Module):
    def __init__(self):
        super().__init__()
        self.l = nn.GRU(128, 128, 128, batch_first=True, bidirectional=True)
    
    def forward(self, x):
        return self.l(x)

model_fp = UserModule()

model_to_quantize = copy.deepcopy(model_fp)
model_to_quantize.eval()
qconfig_mapping = QConfigMapping().set_global(torch.quantization.default_dynamic_qconfig)
# a tuple of one or more example inputs are needed to trace the model

model_prepared = quantize_fx.prepare_fx(model_to_quantize, qconfig_mapping, None)
model_quantized = quantize_fx.convert_fx(model_prepared)

def print_size_of_model(model):
    import os
    torch.save(model.state_dict(), "temp.p")
    print('Size (MB):', os.path.getsize("temp.p")/1e6)
    os.remove('temp.p')

print_size_of_model(model_fp)
print_size_of_model(model_quantized)

The exact same script, but with

self.l = nn.GRU(128, 128, 128, batch_first=True, bidirectional=True)

works like a charm.

I am using torch 1.13, and, as listed here nn.GRU should be supported with dynamic quantization.

Additional info that might be useful, I am able to quantise GRUCell, RNNCell and LSTMCell properly.

Any idea why the model is not getting quantised?

Thanks!

Cheers,
Francesco.

Vasiliy_Kuznetsov · December 7, 2022, 5:30pm

Hi @fpaissan , sorry for the late reply.

It looks like the FX graph mode quantization script does not include the dynamic quantization configuration for torch.nn.GRU, I filed FX graph mode quant: backendconfig configuration missing for torch.nn.GRU · Issue #90394 · pytorch/pytorch · GitHub to track this. Our team can fix this.

Vasiliy_Kuznetsov · December 7, 2022, 5:33pm

@fpaissan , as a workaround you could try using the Eager mode torch.ao.quantization.quantize_dynamic API, which should support torch.nn.GRU.