Multiple Successive Backwards to Train each sample

Hello everyone, I am currently involved in a problem of Missing data imputation using Autoencoder for corrupted sample (randomly zeroed) and its ground_truth, which requires retraining each sample batch multiple times. So I try to use optimizer and backpropagation like this on each training times of the sample. It raised a Runtime error like this:

Blockquote “Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling .backward() or autograd.grad() the first time.”

I followed and added the retain_graph=True in the backward() but it still produced an error:

Blockquote “one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [5, 9]], which is output 0 of TBackward, is at version 9; expected version 8 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).”

I also tried to add the hint but it actually did not work and I am being confused now. My competencies are still not enough to understand thoroughly the new situation here. Hopefully I can receive some advices from your sides, many thanks!

import torch.optim as optim
model=AutoEncoder1D(n_bus=3, latent_dim=5)
criterion = nn.MSELoss()
n_epochs=100
epsilon=1e-4
optimizer=optim.Adam(model.parameters(), lr=1e-3 ) 
err_all=[]

for i, (X_corr, X_gr) in enumerate(zip(X_corrupted, X_groundtr)):

    missing_idx = np.where(X_corr==0)[1]

    err_sample=[]  

    for j in range(n_epochs):

        # loss calculation and saving

        X_pred = model(X_corr)

        loss_epoch = criterion(X_pred, X_gr)

        err_sample.append(loss_epoch.item())

       

        # back propagation

        #with torch.autograd.set_detect_anomaly(True):

        optimizer.zero_grad()

        loss_epoch.backward()#retain_graph=True)

        optimizer.step()

       

        # updating the predicted values for missing positions

        X_corr[0][missing_idx] = X_pred[0][missing_idx]

       

        # tracking the training error

        if j % 20 == 0:

            print("MSE Training Error at {}_th epoch of {}_th sample: {}"

                  .format(j, i, loss_epoch))

    err_all.append(np.max(err_sample))


I guess you are keeping the computation graph alive via:

X_corr[0][missing_idx] = X_pred[0][missing_idx]

Assuming you want to replace some values in the input without tracking the gradient history you could .detach() the right-hand side and try to assign it to X_corr.

Many thanks sir, it actually helped me a lot!
If there is anything else, it would be my honor to receive your support! I hope you have a good day!

Assuming the error is gone I don’t see anything obvious or are you still hitting the issue?