Reproduce qconv kernel for x86

Hi @jerryzh168 I made some progress. I am getting almost the same result as one would get after executing graph module.
However, manual quantization execution outputs are on sum:0.1534857451915741 mean: 0.00213174638338387 max: 0.006238460540771484 min: 3.343820571899414e-05 different per output pixel with bias_quantized = True.

Can you please have a look at this gist and see if there is mistake or there some something in gemm that might be causing this difference. It works with pytorch2.4.0+cpu.

I set the quantized bias to be True as it this is what you mentioned earlier in another thread.