What are the expected behaviours of these multiple outputs?
That is, what if I want to do BP from out1 sometimes and do BP from out2 sometimes?
I’ve tried to directly do BP like this.
class mm(nn.Module):
def __init__(self):
super(mm, self).__init__()
self.m = nn.Linear(3,2)
self.m2 = nn.Linear(2,4)
def forward(self, input):
o1 = self.m(input)
o2 = self.m2(o1)
return o1, o2
mmodel = mm()
optimizer = optim.SGD(mmodel.parameters(), lr=1, momentum=0.8)
#optimizer = optim.Adam(mmodel.parameters(), lr=1, betas=(0.5, 0.999))
input = Variable(torch.randn(1,3))
print(mmodel.m.weight.data)
print(mmodel.m2.weight.data)
mmodel.zero_grad()
output1, output = mmodel(input)
output1.backward(torch.ones(1,2))
#output.backward(torch.ones(1,4))
optimizer.step()
print(mmodel.m.weight.data)
print(mmodel.m2.weight.data)
When I use momentum based SGD, it would also update m2
's weight especially if I have done some updates with output
updates. That is due to the optimizer uses previous momentum to update the m2
's weight. Is there any recipe for this? One hacky way I can think of is to check if grad is 0 and update nonzero gradient.