No Speedup with Depthwise Convolutions

Hi Tom,

What do you think about a proper method to test the performance of the depthwise_conv2d over normal conv layer? I tried to use 50 layers stacked up to test whether the depthwise one is faster than the normal one, is it fair and a straight-forward way?

In this github issue saying that even though depthwise conv has been implemented in cudnn7, on average they are no better than pytorch’s.

But I got no improvement in my own experiment as mentioned above.
Is there something need to config ( such as channels, kernel_size or backends ) specifically to use depthwise_conv2d ?

Thanks in advance

EDIT:
two stacked 50 layers model as follow:

normal_conv_model
layer1 : conv2d(3, 256, 3, padding=1, groups=1)
layer2 to layer49:conv2d(256, 256, 3, padding=1, groups=1)
layer50: conv2d(256, 10, padding=1, groups=1) # for crossentropy
         covn2d(256, 3, padding=1, groups=1) # for MSELoss

separable_conv_model
layer1 : conv2d(3, 256, 3, padding=1, groups=1)
layer2 to layer49:separable_conv2d(256, 256, 3)
layer50: conv2d(256, 10, padding=1, groups=1) # for crossentropy
         conv2d(256, 3, padding=1, groups=1) # for MSELoss

input and output:

random_input = torch.randn((1, 3, 256, 256))
random_output = torch.randint(low=0, high=10, size=(1,256,256)) # for crossentropy
random_output = torch.randn((1, 3, 256, 256)) # for MSELoss

Each model is trained on gpu, cuda 9.0, cudnn7, pytorch 1.0.1 post2.

Parameters and Foward & Backward time cost as follow:
CrossEntropyLoss and Adam optimizer:

Trainable Parameters:
Normal_conv2d    : 28354058
Separable_conv2d : 3311114
Time cost:
Normal_conv2d   : 0.5144641399383545s
Separable_conv2d: 0.5536670684814453s

CrossEntropy and SGD optimizer:

Trainable Parameters:
Normal_conv2d   : 28354058
Separable_conv2d: 3311114
Time cost:
Normal_conv2d   : 0.11238956451416016s
Separable_conv2d: 0.03952765464782715s

MSELoss and Adam optimizer:

Trainable Parameters:
Normal_conv2d   : 28337923
Separable_conv2d: 3294979
Time cost:
Normal_conv2d   : 0.5181684494018555s
Separable_conv2d: 0.5568540096282959s

MSELoss and SGD optimizer:

Trainable Parameters:
Normal_conv2d   : 28337923
Separable_conv2d: 3294979
Time cost:
Normal_conv2d   : 0.17907309532165527s
Separable_conv2d: 0.07207584381103516s

Note that :

  • separable_conv2d include depthwise_conv2d and pointwise_conv2d as mentioned in MobileNet.
  • model’s parameter is more with crossentropy loss due to the last layer out_channles.
  • It is faster using SGD optimizer and CrossEntropy loss.