Is there something wrong to learn multiple models with the same batch simultaneously?

I did something like this:
data = data.to(device)
label = label.to(device)
out1 = model1(data, label)
loss1 = model1.loss
loss1.backward()
out2 = model2(data, label)
loss2 = model2.loss
loss2.backward()

Model1 and model2 are the same models. With the codes above, model2 was not learned correctly at all.
But if I duplicate the batch into GPU twice and learn model1 and model2 with them respectively, both model1 and model2 are gonna be ok.

I think once the backward of loss1 was done, the computational graph would be freed and have no effects on the following computations.
Is there anyone can explain it?

Are you manipulating data inplace in model1 somehow, so that model2 gets a different data distribution?