Convolution operation without the final summation

Saeed_Izadi1 · September 21, 2019, 12:20am

Hello

Is there any way to perform a vanilla convolution operation but without the function summation? Assume that we a feature map, X, of size [B, 3, 64, 64] and a single kernel of size [1, 3, 3, 3]. When doing the vanilla convolution, we get a feature map of size [B, 1, 62, 62], while I’m after a way to get a feature map of size [B, 3, 62, 62], just before collapsing/summing all the convolutional channels into a single feature map

Thanks

ptrblck · September 21, 2019, 11:44am

How would you like to perform the reduction of each step?
Generally, you could unfold the input into 3x3x3 patches, perform the multiplication with the kernel, (sum the result), and fold/reshape to the output shape. Since you are not performing the sum, you would have overlapping patches and I’m not sure how you would like to reduce/reshape them back.

Saeed_Izadi1 · September 23, 2019, 6:11pm

Thanks for the reply.

I want to avoid the reduction across the channels and not the spatial multiplication.
So, each kernel of size 3x3x3 gives three feature maps, instead of merging them to form a single feature map in the output.

ptrblck · September 23, 2019, 6:31pm

In that case, the groups argument should yield the expected results using groups=in_channels:

conv = nn.Conv2d(3, 3, 3, groups=3)
x = torch.randn(10, 3, 64, 64)
output = conv(x)

Saeed_Izadi1 · September 23, 2019, 8:25pm

I’m already familiar with this option but using groups is not actually what I need.

The operation you wrote is just performing convolution with a single kernel 3x3x3 kernel while I need the same operation for say 32 different kernels.

It’s a bit hard to explain what I am after.

Thanks for the time

phan_phan · September 24, 2019, 10:59am

Try to give an example with simple tensors and simple kernels, along with the expected results, so we can see what you want to do.

DavidWRomero · December 30, 2019, 12:23pm

Hi @Saeed_Izadi1, did you find a solution to this problem?

@ptrblck What Saeed was after is the following:
A normal convolution between a [Bx3x5x5] input and a [1x3x3x3] kernel would produce a [Bx1x4x4] response. The reason for that is that after performing spatial convolution channel-wise, i.e. along the last 2 indices ( and producing a tensor [Bx1x3x4x4] ), the channel-wise responses are summed up into a single channel and thus convolution produces a tensor [Bx1x4x4]. Is there a way to obtain access to the tensor [Bx1x3x4x4] ?

Thanks a lot!

Regards,
David

ptrblck · December 31, 2019, 3:16am

Wouldn’t my code snippet yield exactly this?
Have a look at this comparison with a manual approach, where each kernel is used on a single input channel:

# Grouped approach
conv = nn.Conv2d(3, 3, 3, groups=3, bias=False)
x = torch.randn(10, 3, 64, 64)
output = conv(x)

# Compare with manual approach
kernels = conv.weight
output_manual = []
for idx in range(3):
    kernel = kernels[idx:idx+1]
    input = x[:, idx:idx+1]
    out = F.conv2d(input, kernel)
    output_manual.append(out)
output_manual = torch.cat(output_manual, dim=1)

print((output_manual - output).abs().max())
> tensor(4.7684e-07, grad_fn=<MaxBackward1>)

Let me know, if I still misunderstand the use case.

DavidWRomero · December 31, 2019, 9:06am

Hi @ptrblck sorry about the confusion, you are completely right! That is exactly what your proposed approach does.
Thanks for your fast reply

Best Regards and Happy New Year,
David

xyang35 · April 16, 2020, 4:23pm

If I understand @Saeed_Izadi1 correctly, I think the correct way to achieve that should be something like:

conv = nn.Conv2d(3, 9, 3, groups=3)