Error during back propagation

edshkim98 · January 16, 2021, 2:10pm

Hello everyone, I am doing a research project which requires to use sum of two loss functions.
Below is a snippet of my code:

    for epoch in range(epochs):
        e = epoch
        model.train()
        running_loss = 0
        for i, data in enumerate(train_loader):
            #scheduler.step()
            inputs, labels = data['points'].to(device), data['labels'].to(device)
            
            #Discriminator
            #Real image
            inputs2 = torch.cat((inputs, labels.type(torch.float64).reshape(1,2048,1)),axis=2)
            output = disc(inputs2.transpose(1,2).float())
            ls_real = disc_loss(output,num=1)
            optimizerD.zero_grad()
            ls_real.backward()
            
            #Prediction
            output = model(inputs.transpose(1,2),onehot)
            output_pred = output
            _,pred = torch.max(output.data,1)
            #criterion = torch.nn.CrossEntropyLoss()
            #loss = criterion(output, labels)
            
            #Fake image
            inputs2 = torch.cat((inputs, pred.type(torch.float64).reshape(1,2048,1)),axis=2)
            output = disc(inputs2.transpose(1,2).float())
            ls_fake = disc_loss(output, num=0)
            ls_fake.backward(retain_graph=True)
            
            ls_D = ls_real + ls_fake
            optimizerD.step()
            
            optimizer.zero_grad()
            loss = main_loss(output_pred,labels)
            loss = loss + ls_fake
            loss.backward()
            optimizer.step()

However, I get an error at loss.backward() after loss = loss + ls_fake with this error message:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 2]], which is output 0 of TBackward, is at version 11; expected version 9 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The output of loss is different to ls_fake, so the model for ls_fake has two outputs and the model for loss has four outputs.

Please can someone help me with this problem? Thank you!

CedricLy · January 16, 2021, 2:13pm

Try:

Like the error say, there might be a problem with you replacing the loss value.

edshkim98 · January 16, 2021, 2:16pm

@CedricLy, Hi thank you for the quick reply! I’ve just tried your way, but it still gives me the same error message
Could there be another way to solve this issue?
Oh, by the way the output of loss is different to ls_fake, so the model for ls_fake has two outputs and the model for loss has four outputs.

CedricLy · January 16, 2021, 10:26pm

OK, so the found problem is the tensor ls_fake.
After doing optimizerD.step(), the gradients have to be recalculated, before using it for optimizer.step().

edshkim98 · January 17, 2021, 4:19am

@CedricLy Thank you it works now