I hope to train two cascaded networks, e.g. X->Z->Y, Z=net1(X), Y=net2(Z).

I hope to optimize the parameters of these two networks iteratively, i.e., for a fixed parameter of net1, firstly train parameters of net2 using MSE(predY,Y) loss util convergence; then, use the converged MSE loss to train a iteration of net1, etc.

So, I define two optimizers for each networks respectively. My training code is below:

```
net1 = SimpleLinearF()
opt1 = torch.optim.Adam(net1.parameters(), lr=0.01)
loss_func = nn.MSELoss()
for itera1 in range(num_iters1 + 1):
predZ = net1(X)
net2 = SimpleLinearF()
opt2 = torch.optim.Adam(net2.parameters(), lr=0.01)
for itera2 in range(num_iters2 + 1):
predY = net2(predZ)
loss = loss_func(predY,Y)
if itera2 % (num_iters2 // 2) == 0:
print('iteration: {:d}, loss: {:.7f}'.format(int(itera2), float(loss)))
loss.backward(retain_graph=True)
opt2.step()
opt2.zero_grad()
loss.backward()
opt1.step()
opt1.zero_grad()
```

However, I encounter the following mistake:

```
RuntimeError: one of the variables needed for gradient computation has been modified by an
inplace operation: [torch.FloatTensor [1, 1]], which is output 0 of AsStridedBackward0, is at
version 502; expected version 501 instead. Hint: enable anomaly detection to find the
operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
```

Does anyone know why this error occurs? How should I solve this problem. Many Thanks.