Learning doesnt work with custom loss function

(dima) #1

I’m just started with pytorch and trying to understand how to deal with custom loss functions, especially with some non trivial ones.

Problem 1 . I’d like to stimulate my nn to maximize true positive rate and at the same time minimize false discovery rate. For example increase total score on +2 for true positive, and decrease on -5 for false positive.

def tp_fp_loss(yhat, y):
    total_score = 0
    for i in range(y.size()):
        if is_tp(yhat[i],y[i]):
            total_score += 2
        if is_fp(yhat[i],y[i]):
            total_score -= 5
    return -total_score

Problem 2 . In case when y is a list of positive and negative rewards (y = [10,-5, -40, 23, 11, -7]), stimulate nn to maximize sum of rewards.

def max_reward_loss(yhat,y):
    r = torch.autograd.Variable(torch.Tensor(y[yhat >= .5]), requires_grad=True).sum()                    
    return -r

Maybe I’m not completely understand some autograd mechanics, functions which I implemented correctly calculate loss but learning with them doesnt work :frowning: What I’m doing wrong? Can anybody help me with some working solution of any of that problems?


In your first problem, Autograd won’t be able to calculate the gradients for the model parameters, if you try to call backward on total_score as the computation graph is detached from this variable.
You would need to calculate total_score using your model’s output and target in some kind of loss function.
In general, you could use nn.BCEWithLogitsLoss with pos_weight, which makes it possible to trade off recall and precision.

Also, in your second problem you are re-wrapping your tensor into a Variable, thus detaching the computation graph.
Try to use the tensors directly without creating a new tensor.
That being said, Variables are deprecated since 0.4.0. You can use tensors directly now and should set requires_grad=True in case your tensor needs gradients.

(dima) #3

Thank you so much for your reply :slight_smile: it helped me a lot. But now I have questions about nn.BCEWithLogitsLoss and using it with weights.
For example I have 2 classes samples: y = [0,1,1,1,1,0,0,1,0]
and I’d like to get the highest recall, how should I set up weights properly?


Based on the docs a value of pos_weight > 1 would increase the recall.
So you could experiment with some values of pos_weight, e.g. pos_weight = 5./4.