Question about group convolution

Alpha · April 26, 2018, 10:05am

Hi,

I read the doc about group of the Conv2d().

e.g. If I use group=10, does it mean that 10 convolution layers side by side and the 10 layers share the same parameters?

If so, is there an elegant way to use 10 layers of different parameters ?
i.e:
I have a tensor whose size is [batch_size, channel=100, H, W]
and I want to have 5 Conv layers, each looks at only 20 of the 100 channel, how can I do?
Must I use slice operation? Is there a better way?

Thank you in advance!

ptrblck · April 26, 2018, 4:35pm

I think for your use case you can just use groups=5:

conv = nn.Conv2d(
    in_channels=100,
    out_channels=5,
    kernel_size=3,
    stride=1,
    padding=1,
    groups=5)
print(conv.weight.shape)
> torch.Size([5, 20, 3, 3])

Each kernel of the 5 filters will just use 20 input channels and create an output.
Leaving groups=1 gives you torch.Size([5, 100, 3, 3]), which means each filter will use all 100 input channels.

Alpha · April 27, 2018, 2:40am

Thank you very much.

When I use group = 5, I understand each kernel of the 5 filters just use 20 input channels.

But as the conv.weight.shape is [5, 20, 3, 3], does it mean that the 5 filters will share weight ?

DanielLCH · September 21, 2018, 6:26am

You can look for result in the source code.

Guodong_Zhang · October 28, 2018, 3:53pm

Have you figured it out?

Jayan-K-Duggal · March 20, 2019, 5:04am

@ptrblck_de I want to know group convolution is different from simple convolution. What is the effect of it on mocel size, speed and accuracy?

ptrblck · March 20, 2019, 3:18pm

The number of parameters in a grouped convolution will most likely differ, e.g. in the example posted above you see that each kernel has 20 input channels due to the 5 groups instead of 100 as in a vanilla convolution.

I’m not sure you can generalize the speed, since depending on e.g. the kernel size different algorithms might be used by cuDNN, so that even model with more operations (on paper) might run faster than other optimized models.

Basically the same applies to the accuracy, as the model architecture is problem dependent, so you would just have to try out different approaches or stick to a good paper claiming good results using grouped/non-grouped convolutions.

lan2720 · June 30, 2019, 1:08am

Nope.
For example, if in_c = 10, out_c = 20, in_w = 64, in_h = 64, group=10, each input channel (whose shape is (64,64,in_c//group=1)) will have independently filter (whose shape for example is (5,5,1)), the two will be convoluted to (60,60,1). so here group=10 will generate (60,60,group). If out_c = group=10, this will just be the resulting output. But if out_c is the multiple of group like 20, 30, etc, there will be out_c//group feature map concatenating to (60,60,out_c), the total filter parameter = group * (5, 5, in_c//group) * (out_c//group) = 5 * 5 * in_c * (out_c//group). All of weights are different instead of sharing. This op is used in depthwise seperable convolution like MobileNet. When we want to seperate depth, group=in_c is what we want.

Sanjayvarma11 · March 2, 2020, 7:14am

Hi sir,Actually in depthwise seperable convolution we will follow operation by 3x3 followed by 1x1.In nn.conv2d we specify the no of groups in the convolution where the process of seperating the channels took place.can you tell me where the combining of channels i.e(using 1x1) took place in nn.conv2d??

ptrblck · March 2, 2020, 7:19am

You would have to use a separate layer with a kernel size of 1 to combine the features.

Sanjayvarma11 · March 2, 2020, 8:11am

So suppose we need to obtain 512 features from 256 features using depthwise seperable convolution then
plz give one example of code doing depth wise seperable convolution.Thank you for answering

malcolmyanguci · May 16, 2020, 10:28am

Hi. How about if I want to construct the filters with different size of channels? For example, instead of 5 filters all pass 20 channels, i want to do something like[1,10,3,3],[1,20,3,3],[1,30,3,3],[1,30,3,3],[1,10,3,3]. Do you think is possible?

Besides, I wish my question might get your concern. Very appreciated
https://discuss.pytorch.org/t/how-to-train-the-paralleled-layers/81317

ptrblck · May 16, 2020, 10:42am

I don’t think this approach would work out of the box with the current implementations without splitting the operations or alternatively zeropadding the filter kernels in the unwanted input channel dimensions.