Auto-grad backward issue with combined loss function with the same inputs

vincent024 · November 23, 2022, 1:21am

Hi,
I have a loss function that combines loss functions that use the same inputs. However, even though I removed all “in-place operations” and tried different things from “.clone()” or “.clone().detatch()” - I still got an "in-place operation error " with the autograd for the second part (and I can’t find where the problem is).

Also, these 2 parts can be used alone without any “inplace operation errors”, but when used together, it doesn’t work anymore.

The main part related to the problem is:

def forward(self, segin, edgein, segmask, edgemask):
        main_loss  = 0;
        if self.seg_weight > 0:
            seg_loss =  self.seg_weight * self.seg_loss(segin, segmask)
            main_loss = main_loss + seg_loss;
        #end

        if self.att_weight > 0:
            a =  self.edge_attention(segin.detach().clone(), segmask.detach().clone(), edgein.detach().clone());
            att_loss = self.att_weight * a;
            main_loss = main_loss + att_loss;
        #end
        main_loss = torch.mean(main_loss); #main_loss.mean();
        
        return main_loss;
#end

def edge_attention(self, input, target, edge): # seg_pred, seg_gt, edge_pred
        filler = torch.ones_like(target)
        targets = torch.where(edge > 0.5, target, filler);
        return self.seg_loss(input.clone(), targets);
#end

Where seg_loss is a “Dice loss” calculation which also works fine when used alone OR when only one of these 2 parts is used (but not when these 2 parts are used together).

Does anyone have any idea what could be causing the “inplace operation” error and how I could do to fix it ?

I also try to use “torch.autograd.set_detect_anomaly(True)”, but eighter I don’t know where I need to set it or it’s doesn’t explain to me where exactly is the issue.

soulitzer · November 30, 2022, 8:48pm

Are you running into the error “variables needed for gradient computation has been modified by an inplace operation”

vincent024 · November 30, 2022, 9:17pm

Yes, I am getting that kind of error…

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 21, 1]] is at version 21; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I’m also strangely getting the same error with another loss function (which also sums multiple losses).
When I enable “torch.autograd.set_detect_anomaly(True)” it indicates an error on the line of the “cardinality” calculation in the main function that I use for the calculation of the Jaccard Loss function which is this function (it was also saying the same thing before):

def soft_jaccard_score_with_weight(output: torch.Tensor, target: torch.Tensor, weight: torch.Tensor,
                                   smooth: float = 0.0, eps: float = 1e-7, dims=None) -> torch.Tensor:
    assert output.size() == target.size()
    
    if dims is not None:
        intersection = torch.sum(output * target * weight, dim=dims)
        cardinality = torch.sum( (output + target)*weight, dim=dims)
    else:
        intersection = torch.sum(output * target * weight)
        cardinality = torch.sum( (output + target)*weight)
    #end
    union = cardinality - intersection
    jaccard_score = (intersection + smooth) / (union + smooth).clamp_min(eps)
    return jaccard_score
#end

However, as we can see the “torch.sum” is not done as an “in place operation” so I don’t understand why it is pointing there.

Also, “weight” and “target” don’t have gradients - only the “output” variable (which is the model’s prediction) has gradients.

I don’t know if this could be useful, but I am using PyTorch 1.11.0 (maybe it’s related to a bug of that PyTorch version ?)

vincent024 · November 30, 2022, 10:46pm

Well, I finally solved my problem.

The problem came from the fact that the “soft_jaccard_score_with_weight” function is used in the “forward” of a class which calculates the “dynamic weights” to be used at each iteration/training batch.

However, since I was using the same “class Object” (of this loss function class) several times in a row, the code for calculating the gradient probably saw it as that this weight (yet without gradient) had been modified in an “inplace” way, thus causing this problem.

Just using different “loss objects” instead, so that each one is separate from each other, solved my problem.