Hello,
I have built a simple convolutional net and would like to quantize it for deployment.
Here is the code used for quantization.
def quantize_net(net):
for module_name, module in net.named_children():
if module_name in ['conv1', 'conv3_1', 'conv4_1', 'conv5_pa']:
torch.quantization.fuse_modules(module, ['conv', 'bn', 'activation'], inplace=True)
elif module_name in ['conv2', 'conv3_2', 'conv4_2']:
for submodule_name, submodule in module.named_children():
if submodule_name in ['depthwise', 'pointwise']:
torch.quantization.fuse_modules(submodule, ['conv', 'bn', 'activation'], inplace=True)
if module_name in ['conv5_pb']:
torch.quantization.fuse_modules(module, ['conv', 'bn'], inplace=True)
net.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(net, inplace=True)
net(torch.randint(low=0, high=255, size=(1000, 1, 40, 40), dtype=torch.float32))
torch.quantization.convert(net, inplace=True)
All the convolution weights are quantized to quint8 but the biases stay in fp32. I assume that during inference biases are casted to int32 and added to the output of the conv operator. However, I would like to quantize the biases to quint8.
Is this supported at the moment? Is there a way I could achieve it?
Thanks!