I did something like this:
data = data.to(device)
label = label.to(device)
out1 = model1(data, label)
loss1 = model1.loss
loss1.backward()
out2 = model2(data, label)
loss2 = model2.loss
loss2.backward()
Model1 and model2 are the same models. With the codes above, model2 was not learned correctly at all.
But if I duplicate the batch into GPU twice and learn model1 and model2 with them respectively, both model1 and model2 are gonna be ok.
I think once the backward of loss1 was done, the computational graph would be freed and have no effects on the following computations.
Is there anyone can explain it?