Training a network with multiple heads

I’m trying to train a network with multiple heads branching from a common body. I wanted to understand what these two following pieces of code are really doing and how they are different.

o1, o2 = model(input) #outputs

l1 = loss_fn(o1, labels)
l1.backward(retain_graph=True)
l2 = loss_fn(o2, labels)
l2.backward(retain_graph=True)

optimizer.step()

and this :

o1, o2 = model(input) #outputs

l1 = loss_fn(o1, labels)
l2 = loss_fn(o2, labels)

torch.autograd.backward([l1, l2])

optimizer.step()