Quantization Error During Concat -- RuntimeError: Didn't find kernel to dispatch to for operator 'aten::_cat'

Raghav_Gurbaxani · November 6, 2019, 5:33am

@hx89
Thank you for your advice. I tried placing the Quantstub after slice4 in basenet (line 157) and DequantStub at the end. Also I set the qconfig of slice1-4 as None.

github.com

raghavgurbaxani/experiments/blob/master/partial_quantized_craft.py


# -*- coding: utf-8 -*-
from collections import namedtuple
import torch.nn.init as init
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision.models.vgg import model_urls
from torchvision import models

#from basenet.vgg16_bn import vgg16_bn, init_weights

def init_weights(modules):
    for m in modules:
        if isinstance(m, nn.Conv2d):
            init.xavier_uniform_(m.weight.data)
            if m.bias is not None:
                m.bias.data.zero_()
        elif isinstance(m, nn.BatchNorm2d):
            m.weight.data.fill_(1)

This file has been truncated. show original

But now I get the error
RuntimeError: All dtypes must be the same. (quantized_cat at /Users/distiller/project/conda/conda-bld/pytorch_1570710797334/work/aten/src/ATen/native/quantized/cpu/qconcat.cpp:59)

raised by self.skip_add.cat() (line88)

My guess is it’s trying to concat between fp32 and int8 tensors- hence the problem,
I tried moving my quantstub around, but my network has a lot of concat layers so I always incur this problem.

Any ideas on how to deal with this issue ?
Thanks again for your help so far

hx89 · November 6, 2019, 6:53pm

There are a couple things I noticed in your partial_quantized_craft.py:

In line 103: y=self.basenet.dequant(y), it would be better to define the dequant in CRAFT class and use it instead of using the dequant from basenet.
For the error you got, it’s because you moved quant() down so h_relu2_2 became float for example. You may add a quant stub so that the output is still in int8:

...
h_relu2_2 = h
...
h_relu2_2_int8 = self.quant2(h_relu2_2)
...
out = vgg_outputs(h_fc7, h_relu5_3, h_relu4_3, h_relu3_2, h_relu2_2_int8)

Notice you can’t reuse the same quant stub and need to create new one since each quant stub will have different quantization parameters.

Could you try only quantize vgg16_bn first to see how is the accuracy? If the accuracy is good, since vgg16_bn is the dominant part of computation, we’ve already had a lot of performance gain, then can move to quantizing the outer class. To quantize vgg16_bn only, you can do the following:

    def forward(self, X):
        X=self.quant(X)
        h = self.slice1(X)
        h_relu2_2 = h
        h = self.slice2(h)
        h_relu3_2 = h
        h = self.slice3(h)
        h_relu4_3 = h
        h = self.slice4(h)
        h=self.quant(h)
        h_relu5_3 = h
        h = self.slice5(h)
        h_fc7 = h

        h_fc7 = self.dequant1(h_fc7)
        h_relu5_3 = self.dequant2(h_relu5_3)
        h_relu4_3 = self.dequant3(h_relu4_3)
        h_relu3_2 = self.dequant4(h_relu3_2)
        h_relu2_2 = self.dequant5(h_relu2_2)

        vgg_outputs = namedtuple("VggOutputs", ['fc7', 'relu5_3', 'relu4_3', 'relu3_2', 'relu2_2'])
        out = vgg_outputs(h_fc7, h_relu5_3, h_relu4_3, h_relu3_2, h_relu2_2)
        return out

raghuramank100 · November 7, 2019, 12:36am

Hi Raghav,
I see one more error. You are using the same float functional module at multiple locations:

github.com

raghavgurbaxani/experiments/blob/master/partial_quantized_craft.py#L83


    init_weights(self.conv_cls.modules())
    self.skip_add = nn.quantized.FloatFunctional()
    
def forward(self, x):
    """ Base network """
    
    sources = self.basenet(x)


    """ U network """
    #y = torch.cat([sources[0], sources[1]], dim=1)
    y=self.skip_add.cat([sources[0], sources[1]], dim=1)
    y = self.upconv1(y)


    y = F.interpolate(y, size=sources[2].size()[2:], mode='bilinear', align_corners=False)
    #y = torch.cat([y, sources[2]], dim=1)
    y=self.skip_add.cat([y, sources[2]], dim=1)
    
    y = self.upconv2(y)


    y = F.interpolate(y, size=sources[3].size()[2:], mode='bilinear', align_corners=False)
    #y = torch.cat([y, sources[3]], dim=1)

and https://github.com/raghavgurbaxani/experiments/blob/master/partial_quantized_craft.py#L88 etc. This will cause the activations to be quantized incorrectly. A float functional module can be used only once as each module collects statistics on activations. Can you make all of them unique?

Raghav_Gurbaxani · November 7, 2019, 1:28am

@hx89 Thank you so much for your advice. Based on points 1&2, I tried several configurations and they worked much better.

and the model size reduced from 84 MB to 36 MB (quant() placed after slice 2 in vgg)

Here’s another result from a model of 75 MB (quant() placed after slice 4 in vgg)

I am still trying other configurations to improve my results.

In the meantime, I also want to try your configuration (quantize vgg_bn only), could you explain in your code why we have 2 quant() flags , and 5 dequant() flags ? What parts of the qconfig must be set to None ?

Thanks again for your help

Raghav_Gurbaxani · November 7, 2019, 1:29am

@raghuramank100 thanks for pointing that out

hx89 · November 7, 2019, 2:14am

This is great to see the accuracy is getting better!

Are these results obtained after fixing the issue @raghuramank100 pointed out?

For option 3 there’s a typo, there should only be one quant() as:

    def forward(self, X):
        X=self.quant(X)
        h = self.slice1(X)
        h_relu2_2 = h
        h = self.slice2(h)
        h_relu3_2 = h
        h = self.slice3(h)
        h_relu4_3 = h
        h = self.slice4(h)
        h_relu5_3 = h
        h = self.slice5(h)
        h_fc7 = h

        h_fc7 = self.dequant(h_fc7)
        h_relu5_3 = self.dequant(h_relu5_3)
        h_relu4_3 = self.dequant(h_relu4_3)
        h_relu3_2 = self.dequant(h_relu3_2)
        h_relu2_2 = self.dequant(h_relu2_2)

        vgg_outputs = namedtuple("VggOutputs", ['fc7', 'relu5_3', 'relu4_3', 'relu3_2', 'relu2_2'])
        out = vgg_outputs(h_fc7, h_relu5_3, h_relu4_3, h_relu3_2, h_relu2_2)
        return out

And we may not need 5 dequant(), previously I thought each output activation has different distribution so we need 5 of them so that each dequant collects statistics of the specific output activations. But in fact the input of the dequant is already the int8 activation with qparames and dequant doesn’t have state, so we can share dequant().

If you make changes above, you can just set qconfig at model.basenet level instead of model level:
model.basenet.qconfig = torch.quantization.QConfig(activation=torch.quantization.default_histogram_observer,weight=torch.quantization.default_per_channel_weight_observer)

I think this way you don’t need to set any qconfig to be None and PyTorch will only quantize the basenet.

raghuramank100 · November 7, 2019, 6:23pm

Yes, dequants can be shared as they are stateless. quant() cannot be shared as it collects statistics.

Raghav_Gurbaxani · November 7, 2019, 7:08pm

@hx89
Thanks for your help. I tried quantizing only the VGG16 basenet part as per your suggestion, the network compressed from 84MB to 28 MB. Here’s the result -

Although bounding boxes are well aligned, it completely misses out on ‘23’. I still need to figure out the optimum configuration for quantization.

Do you think training on these quantized weights for a few epochs may help ? Any other quantization improvements I can try ?

Thanks again.

hx89 · November 7, 2019, 9:23pm

I think you are very close to the accuracy of the float model, next you can try skip the Conv layers in basenet one by one until the accuracy is acceptable.

Another possible way to improve accuracy is quantization aware training, which is similar to the idea you mentioned. There’s a reference script in torchvision you can take a look:

github.com

pytorch/vision/blob/master/references/classification/train_quantization.py

from __future__ import print_function
import datetime
import os
import time
import sys
import copy

import torch
import torch.utils.data
from torch import nn
import torchvision
import torch.quantization
import utils
from train import train_one_epoch, evaluate, load_data


def main(args):

    if args.output_dir:
        utils.mkdir(args.output_dir)

This file has been truncated. show original

hx89 · November 27, 2019, 6:46am

Hi @Raghav_Gurbaxani, just want to check if you were able to achieve acceptable accuracy for quantized model? It would be great if you could share some updates

Raghav_Gurbaxani · June 28, 2020, 10:13pm

hey @hx89 @raghuramank100 thank you for all your help. The static and dynamic quantization worked well. I am trying out quantized aware training now.

I am trying to quantize a text detection model based on Mobilenet (model definition here )

After inserting the quant and dequant stub, fusing all the conv+bn+relu and conv+relu, replacing cat with skip_add.cat() . I perform the static quantization (script - https://github.com/raghavgurbaxani/Quantization_Experiments/blob/master/try_quantization.py )

After performing quantization, the model size doesn’t go down (in fact it increases )

Original Size:
Size (MB): 6.623636

Fused model Size:
Size (MB): 6.638188

Quantized model Size:
Size (MB): 7.928258

I have even printed the final quantized model here

I changed the qconfig to fused_model.qconfig = torch.quantization.default_qconfig but still quantized_model size is Size (MB): 6.715115

Why doesn’t the model size reduce ?

bhaskar_rao · April 25, 2021, 6:43pm

hi @Raghav_Gurbaxani , I am also trying to quantize CRAFT model. could you share your awesome work… !

jerryzh168 · April 27, 2021, 12:58am

that is unexpected, could you print the model before and after quantization? looks like the one in Quantization_Experiments/quantized_model.txt at master · raghavgurbaxani/Quantization_Experiments · GitHub only has a part of the model quantized?