Hi,
I am trying to implement a depth-wise separable 1D-convolution in torch to read very long 1D-images to cut down parameter count and model latency. As I cannot seem to find an off-the-shelf implementation in torch, I have (following other posts) written my own:
class depthwise_separable_conv(nn.Module):
def __init__(self, nin, nout, depth_kernel_size, stride_size):
super(depthwise_separable_conv, self).__init__()
self.depthwise = nn.Conv1d(nin, nin, kernel_size=depth_kernel_size, stride=stride_size, padding=0, groups=nin)
self.pointwise = nn.Conv1d(nin, nout, kernel_size=1)
def forward(self, x):
out = self.depthwise(x)
out = self.pointwise(out)
return out
However, I have noticed that the the separable convolution ended up taking noticeably longer than the standard 1D-convolution during training. I am wondering:
- Is there some error with my implementation?
- If the implementation is correct, what might be the reason for the slowness of this implementation?
- What do I do to speed it up?