.grad attribute of a non-leaf tensor being accessed

Hi there, im a newbie at pytorch.

I am running into the warning: “UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won’t be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead.”

I am not sure which tensor in my procedure is raising the warning, or what exactly is happening. I tried using .retain_grad() on several of these tensors, but that just turns them into NoneType objects. Any tips on how I can solve this problem (“if it even is a problem”)

Hi,

This warning only means that you are accessing the .grad field of a Tensor for which pytorch will never populate the .grad field.
You can run your code with python -W error your_script.py to make python error out when the warning happens and so show you where it happens exactly.

The gist of the problem here is that only leaf Tensors will have their .grad field populated. So if this warning happens, that means that you think something a leaf while it isn’t. This usually happens if your perform operations on a Tensor that requires gradients, a common mistake is foo = torch.rand(10, requires_grad=True).to(device). In this case, foo won’t be a leaf because it is the result of the .to() operation.

15 Likes

Thank you, very helpful reply!

Hi!
What if I want to know the output grad of leaf variable. I am getting the warning only and not the result. How can I get that.
Thanks.

Hi.

What if I want to know the output grad of leaf variable.

What do you mean by that? For a leaf Tensor, if it was used, the .grad field will be populated after you backward.

1 Like

My code working well after many epoh and iteration (± 25000) iteration. But suddenlly i have this error with this warning. Whats going wrong?

Hi,

I guess you changed something in your model? Some parameters?

Hi @albanD,

I faced the same problem while trying to do saliency on my model.

def test(self):
self.model.train()
self.model.dropout.eval()

        for indx, data, target, filename in self.test_loader:

            data, target = data.to(self.device), target.to(self.device)
            predictions = self.model(data.float())
            predictions.mean().backward()
            saliency = predictions.grad.data.abs().squeeze() 
            saliency_list = saliency.detach().cpu().numpy() 
            torch.sigmoid(predictions) 

I get an error at saliency = predictions.grad.data.abs().squeeze() `, when I am trying to access the gradient data.

This is the full error code I get

UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. """Entry point for launching an IPython kernel. *** AttributeError: 'NoneType' object has no attribute 'data'

Is there a way of sorting this out?

2 Likes

Hi,

This is warning, not an error right?
This is just to tell you that you access the .grad field of a Tensor whose .grad field will never be populated. You can find where the warning is raised to find where you do this access.

I was facing the same problem, but then realized (as @albanD already said) that I had not called loss.backward(), which populates the grad fields of the tensors. After doing it, the error went away.

Hello
I want to add to what @albanD said
Indeed, transferring the newly created random tensor to GPU inside the class init() function results in this warning. Moreover, if you want to see the gradients attached to that random tensor, you will get None. Solution to that problem is to transfer your random tensor to GPU only during the forward pass, NOT in the class definition itself.

To replicate the problem you can run this code (let us call this problematic code)

import torch
import torch.nn as nn
mdev = torch.device("cuda:0")
torch.manual_seed(123)

class mclass(torch.nn.Module):
    def __init__(self):
        super(mclass, self).__init__()
        self.fc1 = nn.Linear(10, 5)
        self.a1 = nn.Sigmoid()
        self.fc2 = nn.Linear(5,1)
        self.a2 = nn.Sigmoid()
        self.mw = torch.rand(5,5, requires_grad=True).to(mdev)
    def forward(self,b):
        b = self.a1(self.fc1(b))
        b = b @ self.mw
        b = self.a2(self.fc2(b))
        return b

# ----- RUN -----        
tmodel = mclass().to(mdev)

a = torch.round(torch.rand(4,1)).to(mdev)
b = torch.rand(4,10).to(mdev)

CE = torch.nn.BCELoss()

pred = tmodel.forward(b)
loss = CE(pred,a)
print(loss)
loss.backward()
print('Gradients attached on my random tensor are:',tmodel.mw.grad)

You will get the the above user warning with None gradients.

The solution is to change your class definition like the following:

class mclass(torch.nn.Module):
    def __init__(self):
        super(mclass, self).__init__()
        self.fc1 = nn.Linear(10, 5)
        self.a1 = nn.Sigmoid()
        self.fc2 = nn.Linear(5,1)
        self.a2 = nn.Sigmoid()
        self.mw = torch.rand(5,5, requires_grad=True) # <----- 
    def forward(self,b):
        b = self.a1(self.fc1(b))
        b = b @ self.mw.to(mdev) # <----- 
        b = self.a2(self.fc2(b))
        return b

Now you can see the attached gradients with no warning.

However, if you are using CPU instead of GPU then you will see the attached gradients and no user warning even on the problematic code. Change mdev = torch.device("cuda:0") to mdev = torch.device("cpu") in the code and run; it will run normally. I do not know why that is happening.

1 Like

If .to() operation will make foo be not a leaf, when set requires_grad=True, how to move data to device to make foo still be a leaf?

Specify the device attribute directly during the tensor creation, register the parameter first and move it to the device by calling to on the parent module, or call .requires_grad_() after moving the tensor to the device.