NCHW layout as default for a new accelerator device

iviarcio · June 9, 2021, 12:40pm

Hello everyone,
I’m trying to build a backend for a new accelerator device in Glow.
It happens that this device only works with the NCHW layout (Convolution, etc.). I used OpenCL Backend as base but I still see in HIR, after optimizations, transpose operations as well as convolution operation in NHWC layout. What am I not doing? The models I’m using (lenet-mnist, resnet50) come from ONNX, in NCHW layout.

ptrblck · June 10, 2021, 4:37am

I don’t know which 3rd party libraries you are using, but note that e.g. cudnn can internally transpose your NCHW data to NHWC if a faster kernel can be found and the overall performance with the permutation would still be better than the alternative NCHW kernel. Depending on your custom backend you might also use a similar 3rd party library.

iviarcio · June 11, 2021, 11:31am

Thanks for the reply @ptrblck, but unfortunately there is no use of 3rd party libraries. In fact, it is a new neural processing unit that performs operations such as convolution in hardware and, in this case, constants and placeholders must be in NCHW layout also.

jfix · June 11, 2021, 4:51pm

Hi @iviarcio, the OCL backend does this transformation in the below location for convolutions, did you also copy this transforms file over for your backend? Perhaps it needs to be tweaked for your usecase. I’d suggest debugging why the below code isn’t being run on your backend.

github.com

pytorch/glow/blob/778d44eff16f8dfac25503e0f917531d71bc950b/lib/Backends/OpenCL/Transforms.cpp#L62

    
      
            if (CN->getLayout() == NCHW) {
              continue;
            }
          
          
  // If there is no compiler controlled local memory on the device,
            // try to avoid kernels that use (additional) copies to local memory.
            if (devInfo != nullptr && devInfo->availableLocalMemory == 0) {
              continue;
            }
          
          
  auto *NR = convertConvToNCHWConv(CN, F);
            CN->getResult().replaceAllUsesOfWith(NR);
            changed = true;
            continue;
          }
          if (auto *PMN = dyn_cast<MaxPoolNode>(&node)) {
            if (cctx.compMode == CompilationMode::Train) {
              continue;
            }
          
          
  if (PMN->getLayout() == NCHW) {

iviarcio · June 14, 2021, 7:17pm

Thank you very much @jfix. I’ll check it out and get back to you, in case you don’t mind, with any other questions or concerns.