I want to calc grads using one loss Variable (lets say, loss1) for only a part of the network and accumulate it in the
.grad attribute of the respective variables that come in that sub-network.
The rest of the network does not take grads from that loss Variable (loss1) but it has grads from some other loss Variable (loss2).
torch.autograd.grad() does not accumulate the grads in the
.grad attribute of the input variables. How do I make them accumulate?
Does the follwoing makes sense?
grads = torch.autograd.grad(outputs=loss1, inputs=subnet_params, create_graph=True)
for (g,v) in zip(grads, subnet_params):
v.grad += g
maybe the simplest way is to create two nn.Modules, one for each part of your network. Each module is then associated with an optimizer that optimize one specific loss:
part1 = Part1(); part2 = Part2()
optim1 = torch.optim.SGD(part1, lr1)
optim2 = torch.optim.SGD(part2, lr2)
input, target = next_data()
x = part1(input)
output = part2(x)
loss1 = your_loss1(output, target)
loss2 = your_loss2(output, target)
optim1.step() # affect part1 to optimize loss1
optim2.step() # affect part2 to optimize loss2
Thanks for the reply. The problem is that the two modules are interlinked and the second loss function uses the
.grad of the variables that were created using the backward of the first loss function.