Hi there,

I tried to build my own loss function, but I’m facing a problem as the loss.grad =None, although the loss.requires_grad = True.

here’s my loss function:

```
def recall_prec(y_true, y_pred, gamma=1,beta=1):
#y_pred_bin = torch.where(y_pred >= 0.5, torch.ones_like(y_pred), torch.zeros_like(y_pred))
#y_pred.requires_grad=True
masked_pred = y_true * y_pred
true_pos = torch.sum(masked_pred)
false_negative = torch.sum(y_true) - true_pos
false_pos = torch.sum(y_pred - masked_pred)
#what if we reached a case where true_pos + false_pos =0 or true_pos + false_negative, then we're dividing by zero, so to be saver I added 1, it won't affect much
prec = true_pos / (true_pos + false_pos+1)
recall = true_pos / (true_pos + false_negative+1)
#print("prec", prec, "recall", recall)
loss = ( ( (1-recall)*gamma + (1-prec)*beta )**2 )
#loss.requires_grad = True
print(loss.grad,loss.requires_grad)
return loss
```

I’m using SGD as an optimizer, I tried to use Adam but still faced the same problem.

Can you please help me with that?

Gradients will be calculated during the backward pass while your code does not show any `backward`

calls. Thus seeing a `None`

`.grad`

attribute would be expected.

Also, in case you are calling `backward`

on the `loss`

tensor you might need to call `retrain_grad()`

in order to print the gradient afterwards.

Thank you very much for your reply.

I tried and printed loss.grad after doing loss.backward and still got NAN

`NaN`

is corresponding to “Not a number” while `None`

is indicating the attribute wasn’t set so you need to be clear what exactly is returned.

It also works for me using `retain_grad`

in this small code snippet:

```
lin = nn.Linear(10, 10)
x = torch.randn(1, 10)
out = lin(x)
loss = out.mean()
loss.retain_grad()
loss.backward()
print(loss.grad)
# tensor(1.)
```

I’m so sorry for that mistake I meant None.

I’ll try the retain_grad and give you an updates.

So quick update:

The loss.grad doesn’t give me None any more - thank you very much for your help -

It gives me : tensor (1.) And it keept giving me the same results for over 20 epochs now, the loss is changing but the loss.grad doesn’t change. Is that normal ? Or do I need to do something differently?

I’m using SGD as an optimizer.

Yes, this is expected, since you are passing an implicit `1`

to the `backward`

calls since `dLoss/dLoss = 1`

.

1 Like

Thank you very much for the info