Hi, I’m confused with the comment in FbgemmConv function below:
It says quantized convolution operates on NHWC layout and an output are kept in NHWC. But since the input is assumed to be in NCHW layout, there has to be a conversion step from NHWC back to NCHW. Where does this layout conversion happens?
I’m comparing outputs of quantized convolution in Pytorch and the same operations translated to and executed on TVM (via TVM’s WIP torch frontend). Even though the operations are just quantize, qconv, and dequantize, the two results doesn’t match. I’m trying to figure out where the difference comes from.