I’ve had some offline discussions about this. Here are my findings:
-
thnn_conv_depthwise2d
, the internal function called, does use CUDA, but not CuDNN. - CuDNN 7’s implementation of grouped/depthwise convolution is up to 3x quicker in the forward pass, but always slower in the backward pass.
- Choosing when to use CuDNN and when not to is very difficult to describe in a maintainable way. It will involve looking at many of the parameters of the conv layer, the current version of CuDNN, and is probably dependent on the GPU used as well.
I think it’s probably a case of waiting until CUDA/CuDNN provide consistent benefits in more situations.