Hi
I am trying to quantize a text detection model based on Mobilenet (model definition here )
After inserting the quant and dequant stub, fusing all the conv+bn+relu and conv+relu, replacing cat with skip_add.cat() . I perform the static quantization (script - https://github.com/raghavgurbaxani/Quantization_Experiments/blob/master/try_quantization.py )
After performing quantization, the model size doesn’t go down (in fact it increases )
Original Size:
Size (MB): 6.623636
Fused model Size:
Size (MB): 6.638188
Quantized model Size:
Size (MB): 7.928258
I have even printed the final quantized model here
I changed the qconfig to fused_model.qconfig = torch.quantization.default_qconfig
but still quantized_model size is Size (MB): 6.715115
Why doesn’t the model size reduce ?
1 Like
Looking at the model def you posted, it looks like it is not yet quantized. One missing thing is calibration. You can add a calibration step after you call prepare and before you call convert:
torch.quantization.prepare(fused_model, inplace=True)
# calibrate your model by feeding it example inputs
for inputs in your_dataset:
fused_model(inputs)
print('Quantized model Size:')
quantized = torch.quantization.convert(fused_model, inplace=False)
print_size_of_model(quantized)
Hi @Vasiliy_Kuznetsov
Thank you for your input, I have updated my script to pass in a few images into the fused model as inputs for calibration.
Please see the updated script here
import os
import config as cfg
from model import East
import torch
import utils
import preprossing
import cv2
import numpy as np
import time
def uninplace(model):
if hasattr(model, 'inplace'):
model.inplace = False
if not model.children():
return
for child in model.children():
uninplace(child)
def print_size_of_model(model):
torch.save(model.state_dict(), "temp.p")
This file has been truncated. show original
But still the quantized model size is bigger than the original model -
Original Size:
Size (MB): 6.623636
Fused model Size:
Size (MB): 6.638188
Quantized model Size:
Size (MB): 6.712286
there seems to be some improvement due to the calibration, but the quantized model size is still not satisfactory compared to the original size
Could you suggest what’s going wrong here ?
@Vasiliy_Kuznetsov
I also tried a script with Quantized Aware Training -
import os
import config as cfg
from model import East
import torch
import utils
import preprossing
import cv2
import numpy as np
import time
import loss
def uninplace(model):
if hasattr(model, 'inplace'):
model.inplace = False
if not model.children():
return
for child in model.children():
uninplace(child)
def print_size_of_model(model):
This file has been truncated. show original
But still the quantized model is bigger than the original model
I don’t know what’s going wrong here
Original Size:
Size (MB): 6.623636
Fused model Size:
Size (MB): 6.638188
Quantized model Size:
Size (MB): 6.712286
QAT model Size:
Size (MB): 6.712286
in the paste here (https://github.com/raghavgurbaxani/Quantization_Experiments/blob/master/quantized_model.txt ), the model doesn’t look quantized. One would expect to see QuantizedConv
instead of Conv
and QuantizedLinear
instead of Linear
. One thing to try could be to make sure to run the convert script and ensure that you see the quantized module equivalents afterwards.
Hi @Vasiliy_Kuznetsov
Please check the updated quantized_model now -
Size (MB): 6.712286
DataParallel(
(module): East(
(mobilenet): MobileNetV2(
(features): Sequential(
(0): Sequential(
(0): ConvBnReLU2d(
(0): Conv2d(
3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(activation_post_process): MinMaxObserver(min_val=-694.3411254882812, max_val=765.30712890625)
)
(1): BatchNorm2d(
32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
(activation_post_process): MinMaxObserver(min_val=-4.2157487869262695, max_val=4.755300998687744)
)
(2): ReLU(
(activation_post_process): MinMaxObserver(min_val=0.0, max_val=4.755300998687744)
)
)
(1): Identity()
This file has been truncated. show original
it seems to have quantized covolutions (line 100 onwards). I don’t know why the layers before line 100 do not have quantized modules.
Do you think my quantstub and dequantstub placement is incorrect ?
Here’s the model (with quant and dequant stub)
import torch.nn as nn
import math
import torch
import config as cfg
import utils
from torch.quantization import QuantStub, DeQuantStub
def conv_bn(inp, oup, stride):
return nn.Sequential(
nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
nn.BatchNorm2d(oup),
nn.ReLU(inplace=True)
)
class InvertedResidual(nn.Module):
def __init__(self, inp, oup, stride, expand_ratio):
super(InvertedResidual, self).__init__()
self.stride = stride
assert stride in [1, 2]
This file has been truncated. show original
Main script here - https://github.com/raghavgurbaxani/Quantization_Experiments/blob/master/try_qat.py
I suspect maybe my quant and dequant stub may be incorrect but apart from that I’ve followed all the steps as posted in the static quantization tutorial.
Reallly appreciate your help
@Vasiliy_Kuznetsov any update on this ?
Hi Raghav,
For post training quantization, we want the model to be in eval mode (see https://github.com/pytorch/pytorch/blob/530d48e93a3f04a5ec63a1b789c19a5f775bf497/torch/quantization/fuse_modules.py#L63 ). So, you can add a model.eval()
call before you fuse modules:
model.eval()
torch.quantization.fuse_modules(...)
1 Like