Error when do depthwise conv2d

Weixuan_Sun · July 31, 2019, 5:04am

My problem is a little bit weird, I am trying to use depthwise convolutional to simultaneously do a Sobel filter on every channel of a tensor. When I do it on a single layer tensor, it works all good. The code is as follow:

input_img = torch.FloatTensor(input_image)  # 1080*1080
input_img=input_img.unsqueeze(0).unsqueeze(0)  # 1*1*1080*1080

a = torch.Tensor([[[-1, 0, 1],[-2, 0, 2],[-1, 0, 1]]])
x = Variable(input_img.float(), requires_grad=False)
w = Variable(a.float(), requires_grad=False)
w = w.unsqueeze(1)

gx = F.conv2d(x, w, padding=1, groups=1)
gx = torch.relu(torch.tanh(gx))
gx = gx.squeeze(0)

Input image is image1, output is the second image.

However, when it comes to multi-channel tensor, I put the above input image to the first channel of the input tensor and set groups=5, the code is as follow:

input_img = torch.FloatTensor(multi_channel_input) # 5*1080*1080
input_img=input_img.unsqueeze(0)  #1*5*1080*1080
a = torch.Tensor([[[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
                  [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
                  [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
                  [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
                  [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]])

x = Variable(input_img.float(), requires_grad=False)
w = Variable(a.float(), requires_grad=False)
w = w.unsqueeze(1)

gx = F.conv2d(x, w, padding=1, groups=5)
gx = torch.relu(torch.tanh(gx))

I expect every sobel kernel works independently on every channel of input tensor just like depthwise conv. And the output of the first channel should be no difference to the above single channel one, but the actual output is the third image, obviously, the second one looks more correct.

I wonder did I create depthwise conv2d wrongly? Or did I make any mistake?

ptrblck · July 31, 2019, 10:53am

Your code seems to work on this dummy example:

input_img = torch.zeros(5, 100, 100)
input_img[:, 25:75, 25:75] = 1.
input_img = input_img.unsqueeze(0)

a = torch.tensor([[[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
                  [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
                  [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
                  [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
                  [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]]).float()

x = input_img
w = a
w = w.unsqueeze(1)
gx = F.conv2d(x, w, padding=1, groups=5)

f, axarr = plt.subplots(5)
for idx in range(5):
    axarr[idx].imshow(gx[0, idx].numpy())

Weixuan_Sun · August 1, 2019, 12:51am

Thank you very much, it works. I found I just made a mistake on the input image.
btw, I wonder is there any way to simultaneously implement a fixed size conv filter on every channel of the input rather than copy the filter by 5 times like I did in my code?

ptrblck · August 1, 2019, 10:04am

expand(5, -1, -1, -1) or repeat(5, 1, 1, 1) should work:

gx = F.conv2d(x, w.expand(5, -1, -1, -1), padding=1, groups=5)

If you want to manipulate w inplace, you should use repeat, since it will allocate new memory, while expand uses the same memory and just changes the stride and size.