Hi all,
Following #3057 and #3265, I was excited to try out depthwise separable convolutions, but I’m having a hard time activating these optimised code paths. I’m currently getting no speedup over default convolutions.
Here are the two layer types that make up the bulk of my network:
# Depthwise
nn.Conv2d(in_chans, in_chans * k, kernel_size, groups = in_chans)
# Normal
nn.Conv2d(in_chans * k, out_chans, 1)
If I profile the network’s execution, I get the following (trimmed):
------------------------- ------------ ------------ ------------
Name CPU time CUDA time Calls
------------------------- ------------ ------------ ------------
conv2d 130.737us 1016.190us 14
cudnn_convolution 160.005us 843.504us 8
thnn_conv_depthwise2d 58.475us 1246.438us 6
My concern is that the depthwise convolutions are being handled by THNN, and not THCUNN, where the new optimisations are. I have a second network which replaces normal convolution with multiple depthwise convolutions, and runs much slower, despite performing much less computation.
Am I missing something obvious?
>>> torch.__version__
'0.4.0a0+82e995e'
>>> torch.version.cuda
'8.0.61'
>>> torch.backends.cudnn.version()
7005