Sharing weights between Conv3d groups

Hey guys,

The documentation for the Conv3d module states that inputs and output can be grouped together, each group with its own set of weights:

groups - controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups.
At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. At groups=in_channels, each input channel is convolved with its own set of filters (of size out_channels // in_channels).

I did a small experiment to confirm this works as I expect it to (and btw, awesome framework where you can just whip something like this together in no time at all :slight_smile:):

import torch
from torch.autograd import Variable
from torch.nn import Conv3d
v = Variable(torch.randn((1, 20, 10, 10)))
v[:, 10:] = 0
c = Conv3d(20, 40, (3, 3), groups=2, bias=False)
r = c(v)
print(r[:, :20].sum())
print(r[:, 20:].sum())


(40L, 10L, 3L, 3L)
Variable containing:
[torch.FloatTensor of size 1]

Variable containing:
[torch.FloatTensor of size 1]

So the input is indeed treated as two separate groups of inputs, each corresponding to a proportionate number of outputs, and (judging from the number of parameters) each group has its own set of weights.

This is, however, not really what I want to do here. I would like all groups to share the same set of weights (i.e., a 20x10x3x3 tensor in the example).

I first tried concatenating several output of a smaller Conv2d layer. It works as expected, but it’s considerably slower (because operations have to be done sequentially?). 10% on my potato-testing-card and 40+% with multiple fast GPUs.
I then tried several combinations of 3d kernels with some unsqueeze and view operations for good measure, but they were just as slow and didn’t produce the result I wanted.

Does anyone have an idea on how to share weights between groups?
For now I’m fine with separate weights (it works quite well), but I would really like to test a theory.
However, blocking resources for several days with an architecture I know is inefficient is not something I’m comfortable with :slight_smile:


Hello guys,

I’m newbie in pytorch and also interested in the shared weights topic. If we use groups==in_channels, we know that each input channel would have its own set of filters, but different, so I did a test to clarify this concept.

It would be nice to have this feature of shared_by_group or something in future release.

How did you solved @jfolz?

I’m also interested in this… Is there any news?