I am using two models in a concatenated fashion. The setup is the following, I have a set of Inputs X1, I am using the first model in order to generate for each X1_i a Y1_i. Based on these X1,Y1 pairs I am then training the second model for E epochs. I have then another dataset X2 X Y2 on which I calculate some loss with the second model. My goal is now to backprop that loss through the training process of the second model in order to update the parameters of the first model which generated the data with which the second model was trained. I have so far running code that looks like somewhat like this:

```
optimizer1 = torch.optim.AdamW(model1.params(), lr=0.001)
optimizer2 = torch.optim.AdamW(model2.params(), lr=0.001)
for x_1 in X1:
y_1 = model1(x_1)
Y1 = torch.cat(Y1,y_1)
for e in range(E):
loss_model2 = torch.tensor(0)
for data_index in range(X1.shape[1]):
loss_model2 = loss_model2 + someLoss2(model2(X1[data_index ]),Y1[data_index])
loss_model2 .backward(retain_graph=True)
optimizer2 .step()
loss_model1 = torch.tensor(0)
for ext_data_index in range(X2.shape[1]):
loss_model1 = loss_model1 + someLoss1(model2(X2[ext_data_index ]),Y2[ext_data_index ])
loss_model1.backward()
optimizer1.step()
```

Sorry for only being able to share pseudo-code since I am not allowed to share company code.

The setup presented here runs without any errors my fear however is that I am not backproping through all trainings iteration of model2 and that I am only backproping through the last iteration and the computation graphs of the previous iterations are discarded