Reproduce qconv kernel for x86

Hi am using x86_inductor_qunatizer following the given tutorial. I want to implement this on FPGA. However, I am having trouble find x86 kernel c++/python implementation of these quantized qconv which are suppose to be defined in torch.ops.qconv but going /ATen/native/quantized/library.cpp wasn’t much helpful.

Here is my understanding of the whole quantized conv execution for x86_inductor.
Data types:
weights: qint8 [-128 to 127]
input output: qunit8 [0-255]
Bias: float32 converted to bias_q: vector
Multiplication: int32
Accumulation: int32
Bias addition: int32
input scale: float32
input zero point: int32
per channel scales: vector
output scale: float32
output zero point: int32

I used symmetric quantization so all zeros points are actually zero.

Here is python version in this gist. It include my understanding of bias_q from this thread

Can someone please help me sort this out, either by helping me verify the gist, i.e. ideas how can I make sure it mimics x86 kernel implementation so I can write FPGA code for it OR point me to kernel implementation of it.

I want to use x86 unlike Xnnpack because it produces [0-255] outputs so that make my life easier as my activation function is ReLU.

Many thanks in advance.
Gurkirt

x86 conv ops can be found here I think: pytorch/aten/src/ATen/native/quantized/library.cpp at dcfa415e6e10b250e56a1793d45e886fd910358e · pytorch/pytorch · GitHub, tests can be found pytorch/test/quantization/core/test_quantized_op.py at dcfa415e6e10b250e56a1793d45e886fd910358e · pytorch/pytorch · GitHub

thank you @jerryzh168 for your answer. I see the test.py and librarry.cpp points to wrapper for backend kernel, I want to see the how backend implements qconv?
Would be right to say that x86 backend uses onednn backend and compute call goes here oneDNN/src/cpu/gemm_x8s8s32x_convolution.cpp at main · oneapi-src/oneDNN · GitHub ?

Hi @jerryzh168 I made some progress. I am getting almost the same result as one would get after executing graph module.
However, manual quantization execution outputs are on sum:0.1534857451915741 mean: 0.00213174638338387 max: 0.006238460540771484 min: 3.343820571899414e-05 different per output pixel with bias_quantized = True.

Can you please have a look at this gist and see if there is mistake or there some something in gemm that might be causing this difference. It works with pytorch2.4.0+cpu.

I set the quantized bias to be True as it this is what you mentioned earlier in another thread.

If anyone is stubling here, I think I figured it out, you can have look at GitHub - gurkirt/pt2e_quantize_bias for answer/tests.