I have built a simple convolutional net and would like to quantize it for deployment.
Here is the code used for quantization.
def quantize_net(net): for module_name, module in net.named_children(): if module_name in ['conv1', 'conv3_1', 'conv4_1', 'conv5_pa']: torch.quantization.fuse_modules(module, ['conv', 'bn', 'activation'], inplace=True) elif module_name in ['conv2', 'conv3_2', 'conv4_2']: for submodule_name, submodule in module.named_children(): if submodule_name in ['depthwise', 'pointwise']: torch.quantization.fuse_modules(submodule, ['conv', 'bn', 'activation'], inplace=True) if module_name in ['conv5_pb']: torch.quantization.fuse_modules(module, ['conv', 'bn'], inplace=True) net.qconfig = torch.quantization.get_default_qconfig('fbgemm') torch.quantization.prepare(net, inplace=True) net(torch.randint(low=0, high=255, size=(1000, 1, 40, 40), dtype=torch.float32)) torch.quantization.convert(net, inplace=True)
All the convolution weights are quantized to quint8 but the biases stay in fp32. I assume that during inference biases are casted to int32 and added to the output of the conv operator. However, I would like to quantize the biases to quint8.
Is this supported at the moment? Is there a way I could achieve it?