Using optimised depthwise convolutions

I’ve had some offline discussions about this. Here are my findings:

  • thnn_conv_depthwise2d, the internal function called, does use CUDA, but not CuDNN.
  • CuDNN 7’s implementation of grouped/depthwise convolution is up to 3x quicker in the forward pass, but always slower in the backward pass.
  • Choosing when to use CuDNN and when not to is very difficult to describe in a maintainable way. It will involve looking at many of the parameters of the conv layer, the current version of CuDNN, and is probably dependent on the GPU used as well.

I think it’s probably a case of waiting until CUDA/CuDNN provide consistent benefits in more situations.