Quantized convolution and NHWC layout

Hi, I’m confused with the comment in FbgemmConv function below:

It says quantized convolution operates on NHWC layout and an output are kept in NHWC. But since the input is assumed to be in NCHW layout, there has to be a conversion step from NHWC back to NCHW. Where does this layout conversion happens?

I’m comparing outputs of quantized convolution in Pytorch and the same operations translated to and executed on TVM (via TVM’s WIP torch frontend). Even though the operations are just quantize, qconv, and dequantize, the two results doesn’t match. I’m trying to figure out where the difference comes from.

cc @raghuramank100 .

I think the logical layout is still NHWC, using memory_format we are re-arranging the physical memory format so that it will help kernels implementations.