Understanding the Conv2d groups parameter

Hi,

I’m trying to build a convolutional 2-D layer for 3-channel images which applies a different convolution per channel. This brought me to investigate the groups parameter in nn.Conv2d. If I’m not mistaken, to do this I should simply create a Conv2d layer in a manner similar than:

conv_layer = Conv2d(3,3,(1,5),groups=3)

For a 1x5 filter. I ran a small test to confirm this idea, but I obtained different results from two outputs which I was expecting to be the same:

import torch
import torch.nn as nn

I = torch.randn(10,3,4,5)

conv1 = nn.Conv2d(3,3,(1,5),groups=3)
conv2 = nn.Conv2d(1,1,(1,5),groups=1)

with torch.no_grad():
    conv2.weight.copy_(conv1.weight[0:1])

out1 = conv1(I)[0,0,...]
out2 = conv2(I[:,0:1,...])[0,0,...]

print(torch.allclose(out1,out2))
> False

I’m coying the weights for channel 1 of the conv1 layer into the single-channel conv2 one, and applying both layers to the same input I. What I’m comparing is the 1st batch of the 1st channel of both outputs, and ideally they should be the same.

Could anyone help me understand where I’m mistaken?
Thanks!
Marc

Hi Marc!

Your understanding of groups does seem to be correct. However,
in your test you’ve overlooked Conv2d’s bias.

You can either turn bias off (bias = False) or copy bias over
from conv1 to conv2, along with weight:

>>> import torch
>>> import torch.nn as nn
>>>
>>> I = torch.randn(10,3,4,5)
>>>
>>> conv1 = nn.Conv2d(3,3,(1,5),groups=3)
>>> conv2 = nn.Conv2d(1,1,(1,5),groups=1)
>>>
>>> with torch.no_grad():
...     conv2.weight.copy_(conv1.weight[0:1])
...
Parameter containing:
tensor([[[[-0.3698,  0.0243,  0.3121,  0.0644,  0.2732]]]], requires_grad=True)
>>> out1 = conv1(I)[0,0,...]
>>> out2 = conv2(I[:,0:1,...])[0,0,...]
>>>
>>> print(torch.allclose(out1,out2))
False
>>>
>>> with torch.no_grad():
...      conv2.bias.copy_ (conv1.bias[0])
...
Parameter containing:
tensor([0.1049], requires_grad=True)
>>> out1 = conv1(I)[0,0,...]
>>> out2 = conv2(I[:,0:1,...])[0,0,...]
>>>
>>> print(torch.allclose(out1,out2))
True

Best.

K. Frank

That makes a lot of sense! I can’t believe I overlooked the bias. I’m still a beginner in DL and Pytorch, so I guess that explains it.

Thanks!