Error during back propagation

Hello everyone, I am doing a research project which requires to use sum of two loss functions.
Below is a snippet of my code:

    for epoch in range(epochs):
        e = epoch
        model.train()
        running_loss = 0
        for i, data in enumerate(train_loader):
            #scheduler.step()
            inputs, labels = data['points'].to(device), data['labels'].to(device)
            
            #Discriminator
            #Real image
            inputs2 = torch.cat((inputs, labels.type(torch.float64).reshape(1,2048,1)),axis=2)
            output = disc(inputs2.transpose(1,2).float())
            ls_real = disc_loss(output,num=1)
            optimizerD.zero_grad()
            ls_real.backward()
            
            #Prediction
            output = model(inputs.transpose(1,2),onehot)
            output_pred = output
            _,pred = torch.max(output.data,1)
            #criterion = torch.nn.CrossEntropyLoss()
            #loss = criterion(output, labels)
            
            #Fake image
            inputs2 = torch.cat((inputs, pred.type(torch.float64).reshape(1,2048,1)),axis=2)
            output = disc(inputs2.transpose(1,2).float())
            ls_fake = disc_loss(output, num=0)
            ls_fake.backward(retain_graph=True)
            
            ls_D = ls_real + ls_fake
            optimizerD.step()
            
            optimizer.zero_grad()
            loss = main_loss(output_pred,labels)
            loss = loss + ls_fake
            loss.backward()
            optimizer.step()

However, I get an error at loss.backward() after loss = loss + ls_fake with this error message:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 2]], which is output 0 of TBackward, is at version 11; expected version 9 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).


The output of loss is different to ls_fake, so the model for ls_fake has two outputs and the model for loss has four outputs.


Please can someone help me with this problem? Thank you!

Try:

Like the error say, there might be a problem with you replacing the loss value.

@CedricLy, Hi thank you for the quick reply! I’ve just tried your way, but it still gives me the same error message :frowning:
Could there be another way to solve this issue?
Oh, by the way the output of loss is different to ls_fake, so the model for ls_fake has two outputs and the model for loss has four outputs.

OK, so the found problem is the tensor ls_fake.
After doing optimizerD.step(), the gradients have to be recalculated, before using it for optimizer.step().

@CedricLy Thank you it works now :slight_smile: