Make custom loss function differentiable

I want to make a custom loss function, but (I think) it is currently not backpropagating. After doing some searching online, it seems like the error is because it is not differentiable. Could I get some help on how to make it differentiable?

This loss function is a simplified version for a more complicated loss function that I would like to use for a multi-label classifier. Here, there are 3 labels (0: Neutral, 1: Positive, 2: Negative). I calculate the loss as follows:

  • If the real label is neutral, I penalise positive and negative by 0.5.
  • If the real label is positive, I penalise neutral by 0.5 and negative by 1.
  • If the real label is negative, I penalise neutral by 0.5 and positive by 1.
def custom_loss_function(output, target):    
    res = []
    for graph_no in range(len(output)):
        currOutput = output[graph_no]
        currTarget = target[graph_no]
        if currTarget == 0: # If real label is neutral
            currLoss = (currOutput[1]**2)*0.5 + (currOutput[2]**2)*0.5
            res.append(currLoss)
        elif currTarget == 1: # If real label is positive
            currLoss = (currOutput[0]**2)*0.5 + (currOutput[2]**2)
            res.append(currLoss)
        elif currTarget == 2: # If real label is positive
            currLoss = (currOutput[0]**2)*0.5 + (currOutput[1]**2)
            res.append(currLoss)
        
    finalRes = torch.mean(torch.Tensor(res))
    return finalRes

My train step:

def train():
        model.train()
        for data in loader:  # Iterate in batches over the training dataset.
            out = model(data.x, data.edge_index, data.batch)  # Perform a single forward pass.
            loss = loss_fn(out, data.y)  # Compute the loss.
            loss.requires_grad_()
            loss.backward()  # Derive gradients.
            optimizer.step()  # Update parameters based on gradients.
            optimizer.zero_grad()  # Clear gradients.

Any and all help would be greatly appreciated. Thank you!

If you want it to be differentiable, this is simple. Just change the target, only. You make a detached copy of the outputs, make whatever changes you want to the targets, then apply those new targets vs. the original outputs with the loss function desired.

By way of example, suppose your target was something like the following:

target = torch.tensor([1]) # where possible values are 0, 1, and 2

Now suppose your Sigmoid activated outputs of probabilities are:

output = torch.tensor([[0.1, 0.8, 0.1]])

We want a copy of the detached outputs which we’ll use to find the appropriate adjusted targets.

from copy import deepcopy

output_copy = deepcopy(output.detach())

Now we can calculate the targets:

def adjust_targets(output_copy, target):
    new_target = torch.empty((0, 3))

    for graph_no in range(len(output_copy)):
        currOutput = output_copy[graph_no]
        currTarget = target[graph_no]
        targ = currOutput
        if currTarget == 0:  # If real label is neutral
            targ[1] -= (currOutput[1]**2)*0.5
            targ[2] -= (currOutput[2]**2)*0.5

        elif currTarget == 1:  # If real label is positive
            targ[0] -= (currOutput[0] ** 2) * 0.5
            targ[2] -= (currOutput[2] ** 2)
            print(targ)

        elif currTarget == 2:  # If real label is positive
            targ[0] -= (currOutput[0] ** 2) * 0.5
            targ[1] -= (currOutput[1] ** 2)

        new_target = torch.cat([new_target, targ.unsqueeze(0)])
    return new_target

This sets it up for L1Loss:

criterion = nn.L1Loss()

And now applying the new function to the targets:

new_target = adjust_targets(output_copy, target)
print(new_target)

That returns:

tensor([[0.0950, 0.8000, 0.0900]])

And then calculate the loss with the original output and new target:

loss = criterion(output, new_target)
print(loss)

Which gives:

tensor(0.0050)

So with some reverse engineering, we can basically customize the loss to anything we want by only adjusting the targets, and while maintaining the graph for gradient descent.

On a side note, if you’re trying to address an unbalanced dataset, you can do this more simply by passing in the weight argument in CrossEntropyLoss.