Custom Loss Function - Error: element 0 of tensors does not require grad and does not have grad_fn

ben_mi · July 4, 2020, 1:16pm

Hi everyone! I’m trying to implement the global pair loss function from “Recognition of Action Units in the Wild with Deep Nets and a New Global-Local Loss”.

I have to look at all possible pairs of the predictions. If the values in a pair are the same, then g_predictions = 1. If they are different, then g_predictions = 0.

I do the same thing for the targets.

Then I need to calculate the loss, by adding them all together.

total += (g_prediction - g_target)**2

This is my implementation:

def custom_loss(preds, targs):
    total = 0
    for pair_p, pair_t in zip(torch.combinations(preds, r=2), torch.combinations(targs, r=2)):
        g1 = torch.eq(pair_p[0], pair_p[1]).type(torch.uint8)
        g2 = torch.eq(pair_t[0], pair_t[1]).type(torch.uint8)
        total += (g2 - g1)**2
    return total

output = model(x)
loss = custom_loss(output, targets)
loss.backward()

I’m getting this error after backward() is called:

RuntimeError: element 0 of tensors does not require grad and does not have grad_fn

Am I retaining the graph properly? Does anyone know how to fix this?

Thanks for the help

chetan_patil · July 4, 2020, 5:28pm

Hi, try this.

output = model(x)
loss = custom_loss(output, targets)
loss.requires_grad= True
loss.backward()

ben_mi · July 4, 2020, 5:49pm

When I do that I no longer get the error, however when I…

print(model.weight.grad)

… it still prints None, so it doesn’t seem to be the solution.
I think the way I’m adding or computing the g1 and g2 value might still not be correct.

chetan_patil · July 4, 2020, 8:25pm

That is the correct way.
Try this,
[x.grad.data for x in model.parameters()] after loss.backward().
If the gradients are calculated, there won’t be any Nones in the list.

ben_mi · July 4, 2020, 8:45pm

Thanks for your help. When I try that I still get an error:

AttributeError: 'NoneType' object has no attribute 'data'

When I try

[x.grad for x in model parameters()]

I get a list of Nones.

ben_mi · July 5, 2020, 9:11am

I now changed the loss function to this, so I only use torch functions:

def custom_loss(p, t):
    total = torch.zeros(1, dtype=torch.float, requires_grad=True)
    for pair_p, pair_t in zip(torch.combinations(p, r=2), torch.combinations(t, r=2)):
        g1 = torch.eq(torch.gather(pair_p, 0, torch.tensor([0])), torch.gather(pair_p, 0, torch.tensor([1]))).type(torch.uint8)
        g2 = torch.eq(torch.gather(pair_t, 0, torch.tensor([0])), torch.gather(pair_t, 0, torch.tensor([1]))).type(torch.uint8)
        total += (g2 - g1)**2
    return total

model = nn.Linear(3, 5)
output = model(x)

loss = custom_loss(output, t)
loss.backward()
print('loss = ', loss)

And the loss printed now prints this:loss = tensor([10.], device='cuda:0', dtype=torch.float32, grad_fn=<AddBackward0>)

But when I check the gradients it is still None. Do you know why this?

print(model.weight.grad)
# None
print([x.grad for x in model.parameters()])
# [None, None]

chetan_patil · July 5, 2020, 9:57am

Can you provide a sample of x and t ?

ben_mi · July 5, 2020, 10:00am

Yes sure. I just used random tensors for testing, like these:

t = torch.tensor([1, 1, 1, 1, 1])
x = torch.tensor([2, 3, 3], dtype=torch.double)

chetan_patil · July 5, 2020, 10:13am

Hi, sorry, unfortunately, I cant’ seem to debug this.
@ptrblck, could you have a look at this ?

ben_mi · July 5, 2020, 4:48pm

No worries, thanks for helping so far. : )

ptrblck · July 6, 2020, 4:58am

torch.eq will break the computation graph and the output will not have a valid grad_fn associated with it as seen here:

x = torch.randn(1, requires_grad=True)
y = torch.randn(1)
z = torch.eq(x, y)
print(z)
> tensor([False])

Are you sure you need to calculate total using g1 and g2 or could use use g1 and g2 to somehow “index” preds and targs and calculate the paired loss function?

ben_mi · July 6, 2020, 10:03am

Thank you for your response. I now changed the function to this and used indexing on pair_p and pair_t.

def custom_loss(p, t):
    total = torch.zeros(1, dtype=torch.float, requires_grad=True)
    for pair_p2, pair_t2 in zip(torch.combinations(p, r=2), torch.combinations(t, r=2)):
        pair_p, _ = torch.sort(pair_p2)
        pair_t, _ = torch.sort(pair_t2)
        pair_p = pair_p.double()
        pair_t = pair_t.double()
        
        g1 = (pair_p[0] == pair_p[1]).type(torch.uint8)
        g2 = (pair_t[0] == pair_t[1]).type(torch.uint8)
        total = total + (torch.max(pair_t[g2.item()], g2.type(torch.double)) - torch.max(pair_p[g1.item()], g1.type(torch.double)))**2
    return total

t = torch.tensor([1, 1, 0, 0, 1])
x = torch.tensor([2, 3, 3], dtype=torch.double)

model = nn.Linear(3, 5)
output = model(x)

loss = custom_loss(output, t)
loss.backward()
print('loss = ', loss)
# loss = tensor([4.], device='cuda:0', grad_fn=<AddBackward0>)

Now I don’t get a list of Nones anymore, but instead I get this tensor full of zeros.

print([x.grad for x in model.parameters()])
#     [tensor([[0., 0., 0,],
#              [0., 0., 0.],
#              [0., 0., 0.],
#              [0., 0., 0.],
#              [0., 0., 0.]], device='cuda:0'), tensor([0., 0., 0., 0., 0.], device='cuda:0')]

Do you know what might cause this?

ptrblck · July 7, 2020, 3:02am

Your initialization might be “unlucky”, as I’ve got some valid gradients in a couple of iterations (I also got a full zero gradient output):

tensor([[ 0.0000,  0.0000,  0.0000],
        [ 6.8986, 10.3479, 10.3479],
        [ 0.0000,  0.0000,  0.0000],
        [13.0105, 19.5157, 19.5157],
        [ 8.8299, 13.2449, 13.2449]])
tensor([0.0000, 3.4493, 0.0000, 6.5052, 4.4150])

Note that, torch.max will return a 1 gradient for the max value and 0s everywhere else.
Based on the input and your calculations you might end up removing the gradient in the last operation.

ben_mi · July 7, 2020, 7:29am

Thank you so much! That helped a lot.
I might have to try to find a different way other than using torch.max then. I somehow have to get a 1 value if the pair consists of [0, 0] or [1, 1] and a 0 value in all other cases, so I’ll have to think about that some more.

ptrblck · July 7, 2020, 7:39am

You might experiment with a “soft” step function, such as sigmoid or x / (sqrt(x*x + delta)) * 0.5 + 0.5, where a smaller delta approximates the step function. This would yield smoother gradients and might help.

Let me know, if these approaches would work.

ben_mi · July 7, 2020, 8:30am

In the actual classifier I’m using, I apply a Sigmoid() in the last layer, so that the classifier outputs are between 0 and 1. The custom loss function I’m trying to implement is supposed to be an extra regularization loss function. For the main loss function I use BCEWithLogitsLoss(). For the extra custom regularization loss function I then wanted to take the predicted values and put them in the same format as the targets, so putting all predicted[predicted < 0.5] = 0 and all predicted[predicted >= 0.5] = 1. I tested this again, when I don’t apply these changes into my custom loss function I also get gradients, like you did during your iterations. However, when I add this change, I just get 0 gradients again, no matter how often I run the program. Shouldn’t I at least still get gradients through the values that were changed to 1? I’m a bit confused about this.

def custom_loss(preds, t):
    total = torch.zeros(1, dtype=torch.float, requires_grad=True)

    p = preds.clone()
    p[preds < 0.5] = 0
    p[preds >= 0.5] = 1

    for pair_p2, pair_t2 in zip(torch.combinations(p, r=2), torch.combinations(t, r=2)):
        pair_p, _ = torch.sort(pair_p2)
        pair_t, _ = torch.sort(pair_t2)
        pair_p = pair_p.double()
        pair_t = pair_t.double()
        
        g1 = (pair_p[0] == pair_p[1]).type(torch.uint8)
        g2 = (pair_t[0] == pair_t[1]).type(torch.uint8)
        total = total + (torch.max(pair_t[g2.item()], g2.type(torch.double)) - torch.max(pair_p[g1.item()], g1.type(torch.double)))**2
    return total