I have to train two models sequentially, where the loss of one model will somehow be used by the other model. Part of the code is as follows:

```
# model1
pred1 = model1(data)
loss1 = loss_fn1(pred1, targets)
optimizer1.zero_grad()
loss1.backward()
optimizer1.step()
# model2
pred2 = model2(data)
loss2 = loss_fn2(pred2, targets)
kl_loss= divergence_loss_fn(
F.softmax(pred1/t, dim=1),
F.softmax(pred2/t, dim=1)
)
loss = (1-alpha) * loss2 + alpha * kl_loss
optimizer2.zero_grad()
loss.backward()
optimizer2.step()
```

If I run this cod as it is I face the following error:

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

If I make the first backward function as `loss1.backward(retain_graph=True)`

, then I face the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2048, 10]], which is output 0 of TBackward, is at version 1565; expected version 1564 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Your help would be much appreciated.