I am trying to implement a depth-wise separable 1D-convolution in torch to read very long 1D-images to cut down parameter count and model latency. As I cannot seem to find an off-the-shelf implementation in torch, I have (following other posts) written my own:
class depthwise_separable_conv(nn.Module): def __init__(self, nin, nout, depth_kernel_size, stride_size): super(depthwise_separable_conv, self).__init__() self.depthwise = nn.Conv1d(nin, nin, kernel_size=depth_kernel_size, stride=stride_size, padding=0, groups=nin) self.pointwise = nn.Conv1d(nin, nout, kernel_size=1) def forward(self, x): out = self.depthwise(x) out = self.pointwise(out) return out
However, I have noticed that the the separable convolution ended up taking noticeably longer than the standard 1D-convolution during training. I am wondering:
- Is there some error with my implementation?
- If the implementation is correct, what might be the reason for the slowness of this implementation?
- What do I do to speed it up?