I’m imagining a scenario where I want to apply a learnable convolution layer to multiple Tensor inputs in a module. I would want the layer to learn from all inputs using the average of the gradients of the shared convolution filter w.r.t. each input.
I see this question has been asked before, so let me expand on it a bit.
Say I have a convolution module that shares weights like this:
class SharedFilterConv2d(nn.Module):
def __init__(self):
super(SharedFilterConv2d, self).__init__()
self.conv = nn.Conv2d(3, 3, 3)
self.shared_weight = nn.Parameter(self.conv.weight)
def forward(self, inputs):
outputs = []
for x in inputs:
outputs.append(F.conv2d(x, self.shared_weight, ...))
return torch.cat(outputs, dim=1)
The output of this operation is passed on through the network, and eventually .backward()
is called on some value that depends on them. Since the weights need gradients calculated with respect to multiple inputs in one backward pass, what happens to the gradients? Are they averaged when optimizer.step()
is called? Should they be averaged? Do the gradients accumulate for all inputs in the input list, even if the input list varies in size?
I am trying to get a grasp on how autograd handles this.