Create a f_score loss function

Hi, I’m trying to implement a f_score loss function. From [https://gist.github.com/SuperShinyEyes/dcc68a08ff8b615442e3bc6a9b55a354] GitHub page I took the implementation of the required function, and tweaked it a bit to suit my needs. But when i ran the code, I get that the gradients are None, even though I’ve set the requires_grad to True in the first variable, as mentioned in https://discuss.pytorch.org/t/runtimeerror-element-0-of-variables-does-not-require-grad-and-does-not-have-a-grad-fn/11074.
Some specifics -

  1. I need the output from the network to be a binary vector, as it represents activities.
  2. The quantity im measuring is prediction of actions, hence the need for the f score loss.

Here is a basic example:

import torch

class FScoreLoss(torch.nn.Module):
    def __init__(self, eps=1e-7):
        super().__init__()
        self.eps = eps
        
    def forward(self, y_true, y_pred, beta, grad=True):
        print(f'y_true = {y_true}')
        print(f'y_pred = {y_pred}')

        tp = (y_true * y_pred).sum().to(torch.float32)
        fn = ((1 - y_true) * y_pred).sum().to(torch.float32)
        fp = (y_true * (1 - y_pred)).sum().to(torch.float32)

        print(f'tp = {tp}, fn = {fn}, fp = {fp}')

        precision = tp / (tp + fp + self.eps)
        recall = tp / (tp + fn + self.eps)

        f_score_loss = (1 + beta ** 2) * (precision * recall) / ((beta**2)*precision + recall + self.eps)

        print(f'precision = {precision}, recall = {recall}, f_score = {f_score_loss}')
    
        return f_score_loss
f_score_loss_func = FScoreLoss()

model = torch.nn.Linear(10, 10)

x = torch.randn(1, 10).requires_grad_(True)

y_true = (torch.randn(1, 10)[0] > .5).float()

y_pred = (model(x)>.5).float().requires_grad_(True)

loss = f_score_loss_func(y_true, y_pred, beta=2., grad=True)
loss.backward()

print(f'y_pred.grad = {y_pred.grad}')

print(f'model.weight.grad = {model.weight.grad}')

As can be seen from the output (i.e. last two lines), :

y_pred.grad = tensor([[-0.4082,  0.3061, -0.4082, -0.4082, -0.4082, -0.4082,  0.3061, -0.4082,
          0.3061, -0.4082]])
model.weight.grad = None

the gradients are being calculated, but not updated in the model.

Appreciate any help. Thanks

Hi,
The result of (model(x)>.5).float() is a binary tensor. Gradients cannot exist for non-continuous values. You might want to use tanh() instead (see this thread for a similar issue Step Activation Function)

Thanks for your reply. The thing is that I need the vector to be binary, because i want to calculate the true / false positives in the function. Is there a work around that you are aware of ?

Also what consider the following example where I use binary vector a to produce another variable b and compute gradients on both:

import torch

a = (torch.randn(5) > .5).float().requires_grad_(True)
print ("a", a)

b = a.abs().mean()*(torch.sign(a))
print ("b", b)

b.retain_grad()
b.sum().backward()

print ("a.grad", a.grad)
print ("b.grad", b.grad)

In a.grad you will get the gradients of b wrt a. That is valid because b has continuous values. But if you try to differentiate a that is not possible because it is non-continuous.

What you are trying to do is more along the lines of :

r = torch.randn(5).requires_grad_(True) # ~ model.weight
print("r", r)

a = (r > .5).float().requires_grad_(True) # ~ y_pred
print ("a", a)

b = a.abs().mean()*(torch.sign(a)) # ~ loss
print ("b", b)

b.retain_grad()
b.sum().backward()

print ("a.grad", a.grad)
print ("b.grad", b.grad)

print("r.grad", r.grad) # None, because a is not continuous

[/quote]

So, what will be the way you can train a network on a F Score? I s it possible?

Is it possible, with regards to the original post, to update the weights by :

loss.backward()
model.weight.grad = torch.zeros_like(model.weight) + y_pred.grad
optimizer.step()

and then to call the optimizer to manipulate the weights of the model?

AFAIK f-score is ill-suited as a loss function for training a network. F-score is better suited to judge a classifier’s calibration, but does not hold enough information for the neural network to improve its predictions.

Loss functions are differentiable so that they can propagate gradients throughout the network using the chain rule (see “backpropagation”).

1 Like

Thanks for the explanation!