The documentation for the
Conv3d module states that inputs and output can be grouped together, each group with its own set of weights:
groups - controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups.
At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. At groups=
in_channels, each input channel is convolved with its own set of filters (of size out_channels // in_channels).
I did a small experiment to confirm this works as I expect it to (and btw, awesome framework where you can just whip something like this together in no time at all ):
import torch from torch.autograd import Variable from torch.nn import Conv3d v = Variable(torch.randn((1, 20, 10, 10))) v[:, 10:] = 0 c = Conv3d(20, 40, (3, 3), groups=2, bias=False) print(c.weight.size()) r = c(v) print(r[:, :20].sum()) print(r[:, 20:].sum())
(40L, 10L, 3L, 3L) Variable containing: 14.2016 [torch.FloatTensor of size 1] Variable containing: 0 [torch.FloatTensor of size 1]
So the input is indeed treated as two separate groups of inputs, each corresponding to a proportionate number of outputs, and (judging from the number of parameters) each group has its own set of weights.
This is, however, not really what I want to do here. I would like all groups to share the same set of weights (i.e., a 20x10x3x3 tensor in the example).
I first tried concatenating several output of a smaller
Conv2d layer. It works as expected, but it’s considerably slower (because operations have to be done sequentially?). 10% on my potato-testing-card and 40+% with multiple fast GPUs.
I then tried several combinations of 3d kernels with some
view operations for good measure, but they were just as slow and didn’t produce the result I wanted.
Does anyone have an idea on how to share weights between groups?
For now I’m fine with separate weights (it works quite well), but I would really like to test a theory.
However, blocking resources for several days with an architecture I know is inefficient is not something I’m comfortable with