Using optimised depthwise convolutions

swibe · January 17, 2018, 9:20am

I’ve had some offline discussions about this. Here are my findings:

thnn_conv_depthwise2d, the internal function called, does use CUDA, but not CuDNN.
CuDNN 7’s implementation of grouped/depthwise convolution is up to 3x quicker in the forward pass, but always slower in the backward pass.
Choosing when to use CuDNN and when not to is very difficult to describe in a maintainable way. It will involve looking at many of the parameters of the conv layer, the current version of CuDNN, and is probably dependent on the GPU used as well.

I think it’s probably a case of waiting until CUDA/CuDNN provide consistent benefits in more situations.