Learning doesnt work with custom loss function


(dima) #1

I’m just started with pytorch and trying to understand how to deal with custom loss functions, especially with some non trivial ones.

Problem 1 . I’d like to stimulate my nn to maximize true positive rate and at the same time minimize false discovery rate. For example increase total score on +2 for true positive, and decrease on -5 for false positive.

def tp_fp_loss(yhat, y):
    total_score = 0
    for i in range(y.size()):
        if is_tp(yhat[i],y[i]):
            total_score += 2
        if is_fp(yhat[i],y[i]):
            total_score -= 5
    return -total_score

Problem 2 . In case when y is a list of positive and negative rewards (y = [10,-5, -40, 23, 11, -7]), stimulate nn to maximize sum of rewards.

def max_reward_loss(yhat,y):
    r = torch.autograd.Variable(torch.Tensor(y[yhat >= .5]), requires_grad=True).sum()                    
    return -r

Maybe I’m not completely understand some autograd mechanics, functions which I implemented correctly calculate loss but learning with them doesnt work :frowning: What I’m doing wrong? Can anybody help me with some working solution of any of that problems?


#2

In your first problem, Autograd won’t be able to calculate the gradients for the model parameters, if you try to call backward on total_score as the computation graph is detached from this variable.
You would need to calculate total_score using your model’s output and target in some kind of loss function.
In general, you could use nn.BCEWithLogitsLoss with pos_weight, which makes it possible to trade off recall and precision.

Also, in your second problem you are re-wrapping your tensor into a Variable, thus detaching the computation graph.
Try to use the tensors directly without creating a new tensor.
That being said, Variables are deprecated since 0.4.0. You can use tensors directly now and should set requires_grad=True in case your tensor needs gradients.


(dima) #3

Thank you so much for your reply :slight_smile: it helped me a lot. But now I have questions about nn.BCEWithLogitsLoss and using it with weights.
For example I have 2 classes samples: y = [0,1,1,1,1,0,0,1,0]
and I’d like to get the highest recall, how should I set up weights properly?


#4

Based on the docs a value of pos_weight > 1 would increase the recall.
So you could experiment with some values of pos_weight, e.g. pos_weight = 5./4.