Convolution operation without the final summation


Is there any way to perform a vanilla convolution operation but without the function summation? Assume that we a feature map, X, of size [B, 3, 64, 64] and a single kernel of size [1, 3, 3, 3]. When doing the vanilla convolution, we get a feature map of size [B, 1, 62, 62], while I’m after a way to get a feature map of size [B, 3, 62, 62], just before collapsing/summing all the convolutional channels into a single feature map


How would you like to perform the reduction of each step?
Generally, you could unfold the input into 3x3x3 patches, perform the multiplication with the kernel, (sum the result), and fold/reshape to the output shape. Since you are not performing the sum, you would have overlapping patches and I’m not sure how you would like to reduce/reshape them back.

Thanks for the reply.

I want to avoid the reduction across the channels and not the spatial multiplication.
So, each kernel of size 3x3x3 gives three feature maps, instead of merging them to form a single feature map in the output.

In that case, the groups argument should yield the expected results using groups=in_channels:

conv = nn.Conv2d(3, 3, 3, groups=3)
x = torch.randn(10, 3, 64, 64)
output = conv(x)
1 Like

I’m already familiar with this option but using groups is not actually what I need.

The operation you wrote is just performing convolution with a single kernel 3x3x3 kernel while I need the same operation for say 32 different kernels.

It’s a bit hard to explain what I am after.

Thanks for the time

Try to give an example with simple tensors and simple kernels, along with the expected results, so we can see what you want to do.

Hi @Saeed_Izadi1, did you find a solution to this problem?

@ptrblck What Saeed was after is the following:
A normal convolution between a [Bx3x5x5] input and a [1x3x3x3] kernel would produce a [Bx1x4x4] response. The reason for that is that after performing spatial convolution channel-wise, i.e. along the last 2 indices ( and producing a tensor [Bx1x3x4x4] ), the channel-wise responses are summed up into a single channel and thus convolution produces a tensor [Bx1x4x4]. Is there a way to obtain access to the tensor [Bx1x3x4x4] ?

Thanks a lot! :slight_smile:


Wouldn’t my code snippet yield exactly this? :thinking:
Have a look at this comparison with a manual approach, where each kernel is used on a single input channel:

# Grouped approach
conv = nn.Conv2d(3, 3, 3, groups=3, bias=False)
x = torch.randn(10, 3, 64, 64)
output = conv(x)

# Compare with manual approach
kernels = conv.weight
output_manual = []
for idx in range(3):
    kernel = kernels[idx:idx+1]
    input = x[:, idx:idx+1]
    out = F.conv2d(input, kernel)
output_manual =, dim=1)

print((output_manual - output).abs().max())
> tensor(4.7684e-07, grad_fn=<MaxBackward1>)

Let me know, if I still misunderstand the use case. :slight_smile:

Hi @ptrblck sorry about the confusion, you are completely right! That is exactly what your proposed approach does.
Thanks for your fast reply :slight_smile:

Best Regards and Happy New Year,


If I understand @Saeed_Izadi1 correctly, I think the correct way to achieve that should be something like:

conv = nn.Conv2d(3, 9, 3, groups=3)