Hello everyone on the forum
1. Background
I have been working on quantization using the x86 backend and have successfully obtained an 8-bit quantized model with various parameters. Now, I am looking to deploy this quantized model on an embedded device. As a result, I need to manually implement all the quantization, convolution, and dequantization operations. However, I have encountered some issues while implementing the convolution operation. I would greatly appreciate it if anyone could kindly assist me.
2. Problem
The code is as follows. I have defined a model and performed quantization. Finally, I manually implemented the convolution quantization process. However, after multiple runs, the results consistently vary. Despite numerous attempts and repeated runs, the outcomes remain consistently unsuccessful.
import torch
import torch.nn as nn
class CustomModel(nn.Module):
def __init__(self):
super().__init__()
self.quant = torch.quantization.QuantStub()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=3, kernel_size=3, stride=1)
self.dequant = torch.quantization.DeQuantStub()
def forward(self, x):
x = self.quant(x)
x = self.conv1(x)
x = self.dequant(x)
return x
net = CustomModel()
net.eval()
backend = "x86"
net.qconfig = torch.quantization.get_default_qconfig(backend)
print(net.qconfig)
torch.quantization.prepare(net, inplace=True)
calibrate_data = torch.randint(low=0, high=255, size=(1, 4, 16), dtype=torch.uint8).unsqueeze(0)
calibrate_data = calibrate_data / 255
_ = net(calibrate_data)
torch.quantization.convert(net, inplace=True)
channel1 = torch.arange(0, 9).view(3, 3).to(torch.int8).unsqueeze(0)
input_data = channel1.unsqueeze(0)
input_data = input_data / 255
activations = []
def custom_hook(module, input, output):
info = {
'module': module,
'input': input,
'output': output
}
activations.append(info)
for name, module in net.named_modules():
if len(list(module.children())) == 0:
module.register_forward_hook(custom_hook)
_ = net(input_data)
# manual implementation
qx = activations[1]['input'][0].int_repr()
wx = activations[1]['module'].weight().int_repr()
sinput = activations[1]['input'][0].q_scale()
zinput = activations[1]['input'][0].q_zero_point()
sweight = activations[1]['module'].weight().q_per_channel_scales()
zweight = activations[1]['module'].weight().q_per_channel_zero_points()
soutput = activations[1]['module'].scale
zoutput = activations[1]['module'].zero_point
bias = activations[1]['module'].bias()
qbias = torch.round(bias / (sinput * sweight))
for i in range(3):
qoutput = torch.sum((qx - zinput) * (wx - zweight[i])) + qbias[i]
qoutput = torch.round(qoutput * sinput * sweight[i] / soutput + zoutput)
qoutput = torch.clamp(qoutput, 0, 256)
print(qoutput)
print(activations[1]['output'][0][i].int_repr())
print((activations[1]['output'][0][i].int_repr() == qoutput).sum())
3. Running result
tensor(63., dtype=torch.float64, grad_fn=<ClampBackward1>)
tensor([[61]], dtype=torch.uint8)
tensor(0)
tensor(62., dtype=torch.float64, grad_fn=<ClampBackward1>)
tensor([[61]], dtype=torch.uint8)
tensor(0)
tensor(72., dtype=torch.float64, grad_fn=<ClampBackward1>)
tensor([[73]], dtype=torch.uint8)
tensor(0)
4. My various attempts to solve this problem
4.1
The problem discussed in this post is similar to mine, but despite attempting the suggested solutions, I was unable to resolve the issue. My code examples and attempts were primarily based on the discussions in that post.
qoutput = qx * wx + qbias
The code provided in Rocket Xu’s answer does not subtract the zero-point (zpoint), and I also attempted to run the code without subtracting the zpoint. However, the results were incorrect.
4.2
This post is also similar to my problem, although it relates to linear layers. Upon observation, I noticed that the implementation in the post is quite similar to mine.
After many attempts, I have no other way, can give me some suggestions?