Loss.grad = None

Hi there,

I tried to build my own loss function, but I’m facing a problem as the loss.grad =None, although the loss.requires_grad = True.
here’s my loss function:

def recall_prec(y_true, y_pred, gamma=1,beta=1):
    #y_pred_bin = torch.where(y_pred >= 0.5, torch.ones_like(y_pred), torch.zeros_like(y_pred))
    #y_pred.requires_grad=True
    masked_pred = y_true * y_pred
    true_pos = torch.sum(masked_pred)
    false_negative = torch.sum(y_true) - true_pos
    false_pos = torch.sum(y_pred - masked_pred)
    
    #what if we reached a case where true_pos + false_pos =0 or true_pos + false_negative, then we're dividing by zero, so to be saver I added 1, it won't affect much 
    prec = true_pos / (true_pos + false_pos+1)
    recall = true_pos / (true_pos + false_negative+1)
    
    #print("prec", prec, "recall", recall)
    loss = ( ( (1-recall)*gamma + (1-prec)*beta )**2  )
    #loss.requires_grad = True
    print(loss.grad,loss.requires_grad)
    return loss

I’m using SGD as an optimizer, I tried to use Adam but still faced the same problem.
Can you please help me with that?

Gradients will be calculated during the backward pass while your code does not show any backward calls. Thus seeing a None .grad attribute would be expected.
Also, in case you are calling backward on the loss tensor you might need to call retrain_grad() in order to print the gradient afterwards.

Thank you very much for your reply.
I tried and printed loss.grad after doing loss.backward and still got NAN

NaN is corresponding to “Not a number” while None is indicating the attribute wasn’t set so you need to be clear what exactly is returned.

It also works for me using retain_grad in this small code snippet:

lin = nn.Linear(10, 10)
x = torch.randn(1, 10)
out = lin(x)

loss = out.mean()
loss.retain_grad()
loss.backward()
print(loss.grad)
# tensor(1.)

I’m so sorry for that mistake I meant None.
I’ll try the retain_grad and give you an updates.

So quick update:

The loss.grad doesn’t give me None any more - thank you very much for your help -

It gives me : tensor (1.) And it keept giving me the same results for over 20 epochs now, the loss is changing but the loss.grad doesn’t change. Is that normal ? Or do I need to do something differently?
I’m using SGD as an optimizer.

Yes, this is expected, since you are passing an implicit 1 to the backward calls since dLoss/dLoss = 1.

1 Like

Thank you very much for the info :slight_smile: