Linear Batchnorm 1D fuse modules for quantization aware training

Hi, Im running a MobileNet Version with GDC(global depthwise convolution) in the last layer

My GDC layer look like this:

class GDC(Module):
    def __init__(self, embedding_size):
        super(GDC, self).__init__()
        self.conv_6_dw = Linear_block(512, 512, groups=512, kernel=(7,7), stride=(1, 1), padding=(0, 0))
        self.conv_6_flatten = Flatten()
        self.linear = Linear(512, embedding_size, bias=False)
        self.bn = BatchNorm1d(embedding_size)

    def forward(self, x):
        x = self.conv_6_dw(x)
        x = self.conv_6_flatten(x)
        x = self.linear(x)
        x = self.bn(x)
        return x

When I run, the code stuck in line x = self.linear(x) with the error:
RuntimeError: Expected self.scalar_type() == ScalarType::Float to be true, but got false.

I think the I do not fuse modules because fuse modules only support [Conv, Relu], [Conv, BatchNorm], [Conv, BatchNorm, Relu], [Linear, Relu] ( not [Linear, BatchNorm1D] ), so that self.linear(x) is using FloatType instead of IntType

How can I custom my network so that the model can be trainable ?
Sorry for any misunderstanding and inconveniences.

@manhntm3 even without fusion the linear operation can be quantized. You can control which operations are quantized by inserting QuantStub/DequantStub around them.

Could you share the code you use to quantize/transform the model for quantization?

@supriyar Thanks you for your answers.
My model code looks like this:

class GDC(Module):
    def __init__(self, embedding_size):
        super(GDC, self).__init__()
        self.conv_6_dw = Linear_block(512, 512, groups=512, kernel=(7,7), stride=(1, 1), padding=(0, 0))
        self.conv_6_flatten = Flatten()
        self.linear = Linear(512, embedding_size, bias=False)
        self.bn = BatchNorm1d(embedding_size)

    def forward(self, x):
        x = self.conv_6_dw(x)
        x = self.conv_6_flatten(x)
        x = self.linear(x)
        x = self.bn(x)
        return x

class NetronNet(Module):
    def __init__(self, embedding_size):
        super(NetronNet, self).__init__()
        self.quant = QuantStub()
        self.conv = ...(some convolutional layer)
        self.linear = GDC(embedding_size, embedding_size)
        self.dequant = DeQuantStub()

    def forward(self, x):
        x = self.quant(x)
        x = self.conv(x)
        x = self.linear(x)
        x = self.dequant(x)
        return x

    def fuse_model(self):
        (fuse self.conv layer)

Here are the code to quantize the model:

    netron.to(cpu_device)
    netron.train()
    netron.fuse_model()
    netron.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')

To elaborate, after fuse model code, my conv layers work just fine. But the GDC module layer(linear with batchnorm) failed to quantize even after I add QuantStub()/DeQuantStub() layer in the module. So how do I know when the layer has been quantize/transform correctly ?