MobileNetV2 + SSDLite quantization results in different model definition

I’m trying to quantize a mobilenetv2 + SSDLite model from https://github.com/qfgaohao/pytorch-ssd

I followed the tutorial here https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html doing Post-training static quantization

Before quantizing the model definition looks like this

SSD(
  (base_net): Sequential(
    (0): Sequential(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
    )
    (1): InvertedResidual(
      (conv): Sequential(
        (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
        (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU6(inplace=True)
        (3): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (2): InvertedResidual(
      (conv): Sequential(
        (0): Conv2d(16, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU6(inplace=True)
        (3): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96, bias=False)
        (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): ReLU6(inplace=True)
        (6): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (7): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    **#Removed some stuff to stay under 32K characters**
    (5): Conv2d(64, 24, kernel_size=(1, 1), stride=(1, 1))
  )
  (source_layer_add_ons): ModuleList()
)

Quantization is done using :

model.eval().to('cpu')
model.fuse_model()
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(model, inplace=True)
torch.quantization.convert(model, inplace=True)

After quantization the model definition looks like this :

SSD(
  (base_net): Sequential(
    (0): Sequential(
      (0): QuantizedConv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), scale=1.0, zero_point=0, padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): QuantizedReLU6(inplace=True)
    )
    (1): InvertedResidual(
      (conv): Sequential(
        (0): QuantizedConv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=32, bias=False)
        (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): QuantizedReLU6(inplace=True)
        (3): QuantizedConv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (2): InvertedResidual(
      (conv): Sequential(
        (0): QuantizedConv2d(16, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): QuantizedReLU6(inplace=True)
        (3): QuantizedConv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), scale=1.0, zero_point=0, padding=(1, 1), groups=96, bias=False)
        (4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): QuantizedReLU6(inplace=True)
        (6): QuantizedConv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0, bias=False)
        (7): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
 
**#Removed some stuff to stay under 32K characters**

    (5): QuantizedConv2d(64, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
  )
  (source_layer_add_ons): ModuleList()
)

Model size decreased from 14MB to 4MB.
but with this new definition how can i load the quantized model ?

I’m trying the following & getting the below error

#Saving
torch.save(q_model.state_dict(), project.quantized_trained_model_dir / file_name)
#Loading the saved quatized model
lq_model = create_mobilenetv2_ssd_lite(len(class_names), is_test=True)
lq_model.load(project.quantized_trained_model_dir / file_name)
#Error
RuntimeError: Error(s) in loading state_dict for SSD:
Unexpected key(s) in state_dict: "base_net.0.0.scale", "base_net.0.0.zero_point", "base_net.0.0.bias", "base_net.1.conv.0.scale", "base_net.1.conv.0.zero_point", "base_net.1.conv.0.bias", "base_net.1.conv.3.scale", "base_net.1.conv.3.zero_point", "base_net.1.conv.3.bias", "base_net.2.conv.0.scale"...

I do understand that after quantization some layers are changed Conv2d -> QuantizedConv2d but does that mean that I have to have 2 model definitions for original & quantized versions?

This a diff of the definitions

Have you tried this?: How do I save and load quantization model

i.e., prepare and convert steps before loading the state_dict

Also, I would expect conv+batchnorm+relu to be fused into QuantizedConvReLU2d but I think you are using relu6 and fusion of conv+batchnorm+relu6 isn’t currently supported.

1 Like

yeah, you’ll need to quantize lq_model after lq_model = create_mobilenetv2_ssd_lite(len(class_names), is_test=True) before you load from the quantized model

I see your point. Will try it & see what happens

This worked even though there around 1.6 seconds of overhead of quantize the vanilla model before loading the model. Thank you

1 Like

@wassimseif: How satisfied were you with the results of the quantization? Where did you add the QuantStub and DeQuantStub functions in the forward pass? In particular, I was wondering is you dequantize the localization and confidence predictions at the very end of the model or whether you dequantize beforehand.

Quantization alone resulted in a huge drop in the mAP. but doing calibration while quantizing resulted in the same mAP compared to the original model. so make sure to explore calibration also. Quant & DeQuant stubs where added just in the model ( before & after the forward pass )

Thanks a lot for your kind and informative reply, @wassimseif !

When I tried exclusively quantization the mAP also drops enormously and I wondered whether I was doing it correctly. Will try calibration too - thanks for the suggestion!

Just to make sure though: At the end of the forward pass, there are two dequant stubs added then, one for the locations and one for the confidences? Initially I thought that I should dequantize after the base net because I suspected that such find grained localization cannot be done well with 8bits but requires the more expressive 32bit representation.

You don’t need 2 DeQuant Stubs. Just 1 & you can reuse it for. Yes dequantizing after the base net might work, but this would result in the SSD layers not being quantized. My case was that fp16 or 32 instruction were not available so i had to quantize the whole model.

Thanks again for the comment, @wassimseif !