Quantized convolution and NHWC layout

masahi · February 3, 2020, 10:13am

Hi, I’m confused with the comment in FbgemmConv function below:

pytorch/pytorch/blob/ecbf6f99e6a4e373105133b31534c9fb50f2acca/aten/src/ATen/native/quantized/cpu/qconv.cpp#L268-L276


// Quantized kernels are all written with NHWC (channels last) layout in
// mind. Ideally, we'd be compatible with conv2d behavior and preserve the
// inputs layout as is (doing necessary upconversions).
//
// However, to be more robust, for now we just force output layout to always
// be NHWC (channels last), thus opportunistically improving perf.
//
// This might change when full memory format support lands
// See https://github.com/pytorch/pytorch/issues/23403

It says quantized convolution operates on NHWC layout and an output are kept in NHWC. But since the input is assumed to be in NCHW layout, there has to be a conversion step from NHWC back to NCHW. Where does this layout conversion happens?

I’m comparing outputs of quantized convolution in Pytorch and the same operations translated to and executed on TVM (via TVM’s WIP torch frontend). Even though the operations are just quantize, qconv, and dequantize, the two results doesn’t match. I’m trying to figure out where the difference comes from.

hx89 · February 3, 2020, 9:51pm

cc @raghuramank100 .

jerryzh168 · February 14, 2020, 7:25pm

I think the logical layout is still NHWC, using memory_format we are re-arranging the physical memory format so that it will help kernels implementations.