About detaching .backward function in pytorch

MariosOreo · April 15, 2019, 2:59am

Yes, the accumulation is default operation (without optimizer.zero_grad).
And using the combination version and separate version are both right, they get the same result.

loss = loss1 + loss2
loss.backward()
# or
loss1.backward(retain_graph=True)
loss2.backward()

Therefore, you don’t need to accumulate the gradients of shared part and perform backprop manually.