Multiple loss gradients

Hi, I’m working on implementing the Pareto efficient fairness algorithm for fairness mitigation that involves a composite loss function as given below:

loss1 = cross-entropy loss
loss2 = lambda * (alpha * L1-norm(error) + (1-alpha)* variance(error))

loss = loss1 + loss2

where, error is a vector of subgroup metrics(accuracies).

Is loss2 differentiable? Also, how do I get the gradients of loss1 and loss 2 w.r.t model parameters separately?

If depends how error is calculated and in particular if the operations are differentiable.
E.g. if you’ve calculated the subgroup metrics by using torch.argmax it won’t be differentiable and you would need to define the gradients manually e.g. via a custom backward operation.

Hi, Thanks for the reply!

So, the error is defined as,

ce_n = cross-entropy loss of subgroup n
f1 = torch.tensor([1/ce_1, 1/ce_2, 1/ce_3])
f2 = torch.tensor([1/ce_4, 1/ce_5, 1/ce_6])

error = 1 - f1/f2

The gradient of loss2 always seems to be zero. Could you please help me figure out why?

Rewrapping tensors into a new tensor will detach them from the computation graph, so you would need to use torch.stack or torch.cat instead to create f1 and f2.

Thank you. I made some changes. Now, the following issue occurs:

I need to update the values in f1 for every batch based on the model’s subgroup performance.
When I tried using the indices like f1[i] = new value, I got the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 1]], which is output 0 of TBackward, is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

How can I update the value of f1 for every batch while training?

I’m not familiar with your use case, but it seems you are currently replacing a computed activation value, which is needed to calculate the gradients, which will yield the error.
Where is new_value coming from and would it be possible to change the computation of f1 so that the value is computed directly?

Sorry, I wasn’t clear.

f1 is the inverse of the cross-entropy losses and it needs to be calculated after every batch is forward propagated. So, new_value basically denotes the cross-entropy losses for that batch. I am trying to update the values in f1 and use it to calculate error as mentioned above, which will be used in calculating the final loss. After updating f1, calling loss.backward() is resulting in the above error. How can I update f1 in every iteration?

You won’t be able to assign values to tensors, if they are needed to compute the gradient.
In your case it seems that f1 is indeed needed in the backward pass to compute some gradients, so that manipulating it inplace is disallowed.
This short example demonstrates the issue:

w1 = nn.Parameter(torch.randn(1))
w2 = nn.Parameter(torch.randn(1))
x = torch.randn(1)

y = w1 * x
out = y * w2
#y[0] = 1.
out.mean().backward()

print(w2.grad, y)

Here you can see, that y is needed to compute the gradient for w2.
If you try to manipulate it after its usage (uncomment the assignment), an error will be raised:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]], which is output 0 of MulBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Thanks for the example. So from my understanding, there is no way to manipulate the tensors that are required in the backward pass? And also creating a new tensor would detach it from the computation graph?

Is this also the case if I set retain_graph = True?

retain_graph wouldn’t change the issue, since the calculation itself is failing, because needed values are gone.
If you know how the gradient should be calculated after the manipulation, you could try to implement a custom autograd.Function as described here.
In any case, I’m still unsure why you want to manipulate the values.

I will check that implementation.

I want a tensor with inverse cross-entropy losses. These loss values change for every forward pass. I’m not sure how to update the tensor values without manipulating the existing values.

You are already calculating the “inverse” ce losses via: [1/ce_1, 1/ce_2, 1/ce_3], so instead of rewrapping them in a tensor, you could use torch.stack([1/ce_1, 1/ce_2, 1/ce_3]), which would not detach these values.