How to train cascaded networks iteratively?

I hope to train two cascaded networks, e.g. X->Z->Y, Z=net1(X), Y=net2(Z).
I hope to optimize the parameters of these two networks iteratively, i.e., for a fixed parameter of net1, firstly train parameters of net2 using MSE(predY,Y) loss util convergence; then, use the converged MSE loss to train a iteration of net1, etc.
So, I define two optimizers for each networks respectively. My training code is below:

net1 = SimpleLinearF()
opt1 = torch.optim.Adam(net1.parameters(), lr=0.01)
loss_func = nn.MSELoss()

for itera1 in range(num_iters1 + 1):
    predZ = net1(X)
        
    net2 = SimpleLinearF()
    opt2 = torch.optim.Adam(net2.parameters(), lr=0.01)
    for itera2 in range(num_iters2 + 1):
        predY = net2(predZ)
        loss = loss_func(predY,Y)
        if itera2 % (num_iters2 // 2) == 0:
            print('iteration: {:d}, loss: {:.7f}'.format(int(itera2), float(loss)))
        loss.backward(retain_graph=True)
        opt2.step()
        opt2.zero_grad()
    
    loss.backward()
    opt1.step()
    opt1.zero_grad()

However, I encounter the following mistake:

RuntimeError: one of the variables needed for gradient computation has been modified by an 
inplace operation: [torch.FloatTensor [1, 1]], which is output 0 of AsStridedBackward0, is at
version 502; expected version 501 instead. Hint: enable anomaly detection to find the 
operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Does anyone know why this error occurs? How should I solve this problem. Many Thanks.