The results of quantized convolution differ from manual implementation

Hello everyone on the forum

1. Background
I have been working on quantization using the x86 backend and have successfully obtained an 8-bit quantized model with various parameters. Now, I am looking to deploy this quantized model on an embedded device. As a result, I need to manually implement all the quantization, convolution, and dequantization operations. However, I have encountered some issues while implementing the convolution operation. I would greatly appreciate it if anyone could kindly assist me.

2. Problem
The code is as follows. I have defined a model and performed quantization. Finally, I manually implemented the convolution quantization process. However, after multiple runs, the results consistently vary. Despite numerous attempts and repeated runs, the outcomes remain consistently unsuccessful.

import torch
import torch.nn as nn


class CustomModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.quant = torch.quantization.QuantStub()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=3, kernel_size=3, stride=1)
        self.dequant = torch.quantization.DeQuantStub()

    def forward(self, x):
        x = self.quant(x)
        x = self.conv1(x)
        x = self.dequant(x)
        return x


net = CustomModel()
net.eval()

backend = "x86"
net.qconfig = torch.quantization.get_default_qconfig(backend)

print(net.qconfig)
torch.quantization.prepare(net, inplace=True)

calibrate_data = torch.randint(low=0, high=255, size=(1, 4, 16), dtype=torch.uint8).unsqueeze(0)
calibrate_data = calibrate_data / 255
_ = net(calibrate_data)

torch.quantization.convert(net, inplace=True)

channel1 = torch.arange(0, 9).view(3, 3).to(torch.int8).unsqueeze(0)
input_data = channel1.unsqueeze(0)
input_data = input_data / 255
activations = []


def custom_hook(module, input, output):
    info = {
        'module': module,
        'input': input,
        'output': output
    }
    activations.append(info)


for name, module in net.named_modules():
    if len(list(module.children())) == 0:
        module.register_forward_hook(custom_hook)

_ = net(input_data)

# manual implementation
qx = activations[1]['input'][0].int_repr()
wx = activations[1]['module'].weight().int_repr()

sinput = activations[1]['input'][0].q_scale()
zinput = activations[1]['input'][0].q_zero_point()

sweight = activations[1]['module'].weight().q_per_channel_scales()
zweight = activations[1]['module'].weight().q_per_channel_zero_points()

soutput = activations[1]['module'].scale
zoutput = activations[1]['module'].zero_point

bias = activations[1]['module'].bias()
qbias = torch.round(bias / (sinput * sweight))

for i in range(3):
    qoutput = torch.sum((qx - zinput) * (wx - zweight[i])) + qbias[i]
    qoutput = torch.round(qoutput * sinput * sweight[i] / soutput + zoutput)
    qoutput = torch.clamp(qoutput, 0, 256)

    print(qoutput)
    print(activations[1]['output'][0][i].int_repr())

    print((activations[1]['output'][0][i].int_repr() == qoutput).sum())

3. Running result

tensor(63., dtype=torch.float64, grad_fn=<ClampBackward1>)
tensor([[61]], dtype=torch.uint8)
tensor(0)
tensor(62., dtype=torch.float64, grad_fn=<ClampBackward1>)
tensor([[61]], dtype=torch.uint8)
tensor(0)
tensor(72., dtype=torch.float64, grad_fn=<ClampBackward1>)
tensor([[73]], dtype=torch.uint8)
tensor(0)

4. My various attempts to solve this problem

4.1

The problem discussed in this post is similar to mine, but despite attempting the suggested solutions, I was unable to resolve the issue. My code examples and attempts were primarily based on the discussions in that post.

qoutput = qx * wx + qbias

The code provided in Rocket Xu’s answer does not subtract the zero-point (zpoint), and I also attempted to run the code without subtracting the zpoint. However, the results were incorrect.
4.2

This post is also similar to my problem, although it relates to linear layers. Upon observation, I noticed that the implementation in the post is quite similar to mine.

After many attempts, I have no other way, can give me some suggestions?

Hi @Cao_kim, have you taken a look at Executorch? It is a library that can be used to run PyTorch programs on device and supports quantization. See Quantization Overview — ExecuTorch 0.2 documentation for more details

Thank you for your response, but I need to implement the processes of quantization, convolution, and dequantization using C++ in order to facilitate deployment on embedded devices. It seems that this may not be suitable.