I’m trying to train a network with multiple heads branching from a common body. I wanted to understand what these two following pieces of code are really doing and how they are different.
o1, o2 = model(input) #outputs
l1 = loss_fn(o1, labels)
l1.backward(retain_graph=True)
l2 = loss_fn(o2, labels)
l2.backward(retain_graph=True)
optimizer.step()
and this :
o1, o2 = model(input) #outputs
l1 = loss_fn(o1, labels)
l2 = loss_fn(o2, labels)
torch.autograd.backward([l1, l2])
optimizer.step()