How does quantized conv2d handle scale and zero_point?

Dear Pytorch community,

For my research, I am recreating the convolution function using my code. However, my manual conv2d result cannot match what’s given by torch.nn.quantized.functional. I appreciate any advice on how a convolution work on 3-dimension input and kernels under quantized parameters.

For a specific example, I’ve been working on the first conv layer of a Resnet50 model. I have quantized input images of (1, 3, 224, 224). They are padded so that the input to the conv layer is (1, 3, 230, 230). The convolution has 64 kernels of (3, 7, 7) and a stride of 2, which gives an output of (64, 112, 112). Basically, I’m trying to compare my result, manual_res, with the element output_ref from the same location in the output tensor at every iteration of the kernel. The indexing shouldn’t be a problem here because I tried qF.conv2d on the two tensors and it is able to match with output_ref. Here’s what I’ve written as the convolution loop.

import torch.nn.functional as F
from torch.nn.quantized import functional as qF

conv1_pad = (3, 3, 3, 3)
after_pad = F.pad(after_quant, conv1_pad, "constant", 0)  # input to the conv layer
print("after pad: ", after_pad.shape)   # 1, 3, 230, 230

my_conv1_result = torch.zeros(after_conv1.shape)

for c in range(0, conv1_weight.shape[0]):  # 64 output channels
    kernel = conv1_weight[c]  # 3x7x7 kernel
    target_y = 0  # index in result tensor
    
    for start_y in range(0, after_pad.shape[2] - 7, 2):  # 112
        target_x = 0
#         print(start_y, end=", ")
        for start_x in range(0, after_pad.shape[3] - 7, 2):  # 112
            input_tensor = after_pad[0, :, start_x:start_x + 7, start_y:start_y + 7]  # 3x7x7
            manual_res   = torch.tensor(0, dtype=torch.int8)  # uint8 for activation
            output_ref   = after_conv1[0, c, target_x, target_y]
        
#             print(input_tensor.int_repr())
#             print(kernel.int_repr())
            print("output_ref:", output_ref.int_repr(), end="  =====  ")
      
            for i in range(kernel.shape[0]):  # 3
                for j in range(kernel.shape[1]):  # 7
                    for k in range(kernel.shape[2]):  # 7
                        #####################
                        # Multiply and accumulate
                        temp = (input_tensor.int_repr()[i, j, k] - input_tensor.q_zero_point()) * (kernel.int_repr()[i, j, k] - kernel.q_zero_point())
                        manual_res = manual_res + temp
                        #####################


            manual_res = conv1.zero_point + (manual_res * (input_tensor.q_scale() * kernel.q_scale() / conv1.scale)).round()
            manual_res = 255 if manual_res > 255 else 0 if manual_res < 0 else manual_res
            print("manual_res:", manual_res, end="  =====  ")
            my_conv1_result[0, c, target_x, target_y] = manual_res
            
            qf_conv_res = qF.conv2d(input_tensor.reshape((1, 3, 7, 7)), kernel.reshape((1, 3, 7, 7)), bias=torch.tensor([0], dtype=torch.float), scale=conv1.scale, zero_point=conv1.zero_point)
            # conv1 is the first conv layer with its scale and zp
            print("qF.conv2d ref:", qf_conv_res.int_repr())
        
            target_x += 1
            
        target_y += 1

The printed shows a mismatch between my manual_res and output_ref or qF.conv2d ref.

output_ref: tensor(66, dtype=torch.uint8)  =====  manual_res: tensor(75.)  =====  qF.conv2d ref: tensor([[[[66]]]], dtype=torch.uint8)
output_ref: tensor(66, dtype=torch.uint8)  =====  manual_res: tensor(79.)  =====  qF.conv2d ref: tensor([[[[66]]]], dtype=torch.uint8)
output_ref: tensor(66, dtype=torch.uint8)  =====  manual_res: tensor(64.)  =====  qF.conv2d ref: tensor([[[[66]]]], dtype=torch.uint8)

I think the problem goes with the handling of scales and zero_point during the loop. I am referring to this paper: https://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf

It states that

.
In my case, q3 would be manual_res, Z1-3 and S1-3 are input’s, kernel’s and conv1’s zero_point and scales, q1, q2 are elements from the input and kernel. I am desperate to know why my implementation is not correct. (A side note, I tried my implementation with random 2-dimension tensors with the same handling of scales and zp, it seems to work fine.)

Again, thanks for your help!

I think what we have is this (copying from our internal design doc):

z = qconv(wq, xq)
# z is at scale (weight_scale*input_scale) and at int32
# Convert to int32 and perform 32 bit add
bias_q = round(bias/(input_scale*weight_scale))
z_int = z + bias_q
# rounding to 8 bits
z_out  = round[(z_int)*(input_scale*weight_scale)/output_scale) - z_zero_point]
z_out = saturate(z_out)

In my quantized model, I have the following layer:
QuantizedConvReLU2d(192, 192, kernel_size=(3, 3), stride=(2, 2), scale=0.06603512912988663, zero_point=0, padding=(5, 1), groups=192).
Is the scale=0.06603512912988663 here output_scale or something else? Thanks!

yeah this is output scale

Currently in pytorch implementation, if I use Eager Mode Quantization, a quantized convolution will perform convolution between quantized input and quantized weight and add the result with bias of dtype fp32, right? As decribed in (PROTOTYPE) PYTORCH 2.0 EXPORT POST TRAINING STATIC QUANTIZATION

Q/DQ Representation (default)

def quantized_linear(x_int8, x_scale, x_zero_point, weight_int8, weight_scale, weight_zero_point, bias_fp32, output_scale, output_zero_point):
    x_fp32 = torch.ops.quantized_decomposed.dequantize_per_tensor(
             x_i8, x_scale, x_zero_point, x_quant_min, x_quant_max, torch.int8)
    weight_fp32 = torch.ops.quantized_decomposed.dequantize_per_tensor(
             weight_i8, weight_scale, weight_zero_point, weight_quant_min, weight_quant_max, torch.int8)
    weight_permuted = torch.ops.aten.permute_copy.default(weight_fp32, [1, 0]);
    out_fp32 = torch.ops.aten.addmm.default(bias_fp32, x_fp32, weight_permuted)
    out_i8 = torch.ops.quantized_decomposed.quantize_per_tensor(
    out_fp32, out_scale, out_zero_point, out_quant_min, out_quant_max, torch.int8)
    return out_i8

this type of representation of quantization will take account of weights and activations, but not bias.

Is there any quantization flow that will perform a quantized convolution according to the algorithm you write?

yes, we also offer an option to do DerivedQuantizationSpec, see tutorial here: How to Write a Quantizer for PyTorch 2 Export Quantization — PyTorch Tutorials 2.1.0+cu121 documentation