Yes, the accumulation is default operation (without optimizer.zero_grad
).
And using the combination version and separate version are both right, they get the same result.
loss = loss1 + loss2
loss.backward()
# or
loss1.backward(retain_graph=True)
loss2.backward()
Therefore, you don’t need to accumulate the gradients of shared part and perform backprop manually.