Hi Tom,
What do you think about a proper method to test the performance of the depthwise_conv2d
over normal conv layer? I tried to use 50 layers stacked up to test whether the depthwise one is faster than the normal one, is it fair and a straight-forward way?
In this github issue saying that even though depthwise conv has been implemented in cudnn7, on average they are no better than pytorch’s.
But I got no improvement in my own experiment as mentioned above.
Is there something need to config ( such as channels, kernel_size or backends ) specifically to use depthwise_conv2d ?
Thanks in advance
EDIT:
two stacked 50 layers model as follow:
normal_conv_model
layer1 : conv2d(3, 256, 3, padding=1, groups=1)
layer2 to layer49:conv2d(256, 256, 3, padding=1, groups=1)
layer50: conv2d(256, 10, padding=1, groups=1) # for crossentropy
covn2d(256, 3, padding=1, groups=1) # for MSELoss
separable_conv_model
layer1 : conv2d(3, 256, 3, padding=1, groups=1)
layer2 to layer49:separable_conv2d(256, 256, 3)
layer50: conv2d(256, 10, padding=1, groups=1) # for crossentropy
conv2d(256, 3, padding=1, groups=1) # for MSELoss
input and output:
random_input = torch.randn((1, 3, 256, 256))
random_output = torch.randint(low=0, high=10, size=(1,256,256)) # for crossentropy
random_output = torch.randn((1, 3, 256, 256)) # for MSELoss
Each model is trained on gpu, cuda 9.0, cudnn7, pytorch 1.0.1 post2.
Parameters and Foward & Backward time cost as follow:
CrossEntropyLoss and Adam optimizer:
Trainable Parameters:
Normal_conv2d : 28354058
Separable_conv2d : 3311114
Time cost:
Normal_conv2d : 0.5144641399383545s
Separable_conv2d: 0.5536670684814453s
CrossEntropy and SGD optimizer:
Trainable Parameters:
Normal_conv2d : 28354058
Separable_conv2d: 3311114
Time cost:
Normal_conv2d : 0.11238956451416016s
Separable_conv2d: 0.03952765464782715s
MSELoss and Adam optimizer:
Trainable Parameters:
Normal_conv2d : 28337923
Separable_conv2d: 3294979
Time cost:
Normal_conv2d : 0.5181684494018555s
Separable_conv2d: 0.5568540096282959s
MSELoss and SGD optimizer:
Trainable Parameters:
Normal_conv2d : 28337923
Separable_conv2d: 3294979
Time cost:
Normal_conv2d : 0.17907309532165527s
Separable_conv2d: 0.07207584381103516s
Note that :
-
separable_conv2d
include depthwise_conv2d
and pointwise_conv2d
as mentioned in MobileNet.
- model’s parameter is more with
crossentropy loss
due to the last layer out_channles
.
- It is faster using
SGD
optimizer and CrossEntropy
loss.