Depthwise Separable Convolution is Slower than Standard Convolution


I am trying to implement a depth-wise separable 1D-convolution in torch to read very long 1D-images to cut down parameter count and model latency. As I cannot seem to find an off-the-shelf implementation in torch, I have (following other posts) written my own:

class depthwise_separable_conv(nn.Module):

    def __init__(self, nin, nout, depth_kernel_size, stride_size):
        super(depthwise_separable_conv, self).__init__()
        self.depthwise = nn.Conv1d(nin, nin, kernel_size=depth_kernel_size, stride=stride_size, padding=0, groups=nin)
        self.pointwise = nn.Conv1d(nin, nout, kernel_size=1)

    def forward(self, x):
        out = self.depthwise(x)
        out = self.pointwise(out)
        return out

However, I have noticed that the the separable convolution ended up taking noticeably longer than the standard 1D-convolution during training. I am wondering:

  1. Is there some error with my implementation?
  2. If the implementation is correct, what might be the reason for the slowness of this implementation?
  3. What do I do to speed it up?

I’m facing the same issue, did you reached somthing?