Quantize biases to quint8 for model deployment


I have built a simple convolutional net and would like to quantize it for deployment.

Here is the code used for quantization.

def quantize_net(net):

    for module_name, module in net.named_children():

        if module_name in ['conv1', 'conv3_1', 'conv4_1', 'conv5_pa']:
            torch.quantization.fuse_modules(module, ['conv', 'bn', 'activation'], inplace=True)
        elif module_name in ['conv2', 'conv3_2', 'conv4_2']:
            for submodule_name, submodule in module.named_children():
                if submodule_name in ['depthwise', 'pointwise']:
                    torch.quantization.fuse_modules(submodule, ['conv', 'bn', 'activation'], inplace=True)
        if module_name in ['conv5_pb']:
            torch.quantization.fuse_modules(module, ['conv', 'bn'], inplace=True)

    net.qconfig = torch.quantization.get_default_qconfig('fbgemm')
    torch.quantization.prepare(net, inplace=True)

    net(torch.randint(low=0, high=255, size=(1000, 1, 40, 40), dtype=torch.float32))

    torch.quantization.convert(net, inplace=True)

All the convolution weights are quantized to quint8 but the biases stay in fp32. I assume that during inference biases are casted to int32 and added to the output of the conv operator. However, I would like to quantize the biases to quint8.

Is this supported at the moment? Is there a way I could achieve it?

Thanks! :slight_smile:

Hi @dalnoguer

We currently don’t support quantizing biases to quint8. The main reason being our quantized backend (fbgemm) supports fp32 biases.

If you wish to update this to use quint8 bias, you’d have to write your own custom operator that accepts biases in this format along with the necessary kernel to do the computation in int8.