Shared parameter model and submodel BP

What are the expected behaviours of these multiple outputs?
That is, what if I want to do BP from out1 sometimes and do BP from out2 sometimes?
I’ve tried to directly do BP like this.

class mm(nn.Module):
    def __init__(self):
        super(mm, self).__init__()
        self.m = nn.Linear(3,2)
        self.m2 = nn.Linear(2,4)
    def forward(self, input):
        o1 = self.m(input)
        o2 = self.m2(o1)
        return o1, o2

mmodel = mm()
optimizer = optim.SGD(mmodel.parameters(), lr=1, momentum=0.8)
#optimizer = optim.Adam(mmodel.parameters(), lr=1, betas=(0.5, 0.999))
input = Variable(torch.randn(1,3))
print(mmodel.m.weight.data)
print(mmodel.m2.weight.data)
mmodel.zero_grad()
output1, output = mmodel(input)
output1.backward(torch.ones(1,2))
#output.backward(torch.ones(1,4))
optimizer.step()
print(mmodel.m.weight.data)
print(mmodel.m2.weight.data)

When I use momentum based SGD, it would also update m2's weight especially if I have done some updates with output updates. That is due to the optimizer uses previous momentum to update the m2's weight. Is there any recipe for this? One hacky way I can think of is to check if grad is 0 and update nonzero gradient.

1 Like