.grad attribute of a non-leaf tensor being accessed

quoteunquote · May 21, 2020, 12:42pm

Hi there, im a newbie at pytorch.

I am running into the warning: “UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won’t be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead.”

I am not sure which tensor in my procedure is raising the warning, or what exactly is happening. I tried using .retain_grad() on several of these tensors, but that just turns them into NoneType objects. Any tips on how I can solve this problem (“if it even is a problem”)

albanD · May 21, 2020, 2:55pm

Hi,

This warning only means that you are accessing the .grad field of a Tensor for which pytorch will never populate the .grad field.
You can run your code with python -W error your_script.py to make python error out when the warning happens and so show you where it happens exactly.

The gist of the problem here is that only leaf Tensors will have their .grad field populated. So if this warning happens, that means that you think something a leaf while it isn’t. This usually happens if your perform operations on a Tensor that requires gradients, a common mistake is foo = torch.rand(10, requires_grad=True).to(device). In this case, foo won’t be a leaf because it is the result of the .to() operation.

quoteunquote · May 22, 2020, 3:08pm

Thank you, very helpful reply!

gopesh97 · July 15, 2020, 4:43am

Hi!
What if I want to know the output grad of leaf variable. I am getting the warning only and not the result. How can I get that.
Thanks.

albanD · July 15, 2020, 3:41pm

Hi.

What if I want to know the output grad of leaf variable.

What do you mean by that? For a leaf Tensor, if it was used, the .grad field will be populated after you backward.

Hady_Pranoto · July 26, 2020, 3:17am

My code working well after many epoh and iteration (± 25000) iteration. But suddenlly i have this error with this warning. Whats going wrong?

albanD · July 27, 2020, 1:53pm

Hi,

I guess you changed something in your model? Some parameters?

ykukkim · October 8, 2020, 8:17am

Hi @albanD,

I faced the same problem while trying to do saliency on my model.

def test(self):
self.model.train()
self.model.dropout.eval()

        for indx, data, target, filename in self.test_loader:

            data, target = data.to(self.device), target.to(self.device)
            predictions = self.model(data.float())
            predictions.mean().backward()
            saliency = predictions.grad.data.abs().squeeze() 
            saliency_list = saliency.detach().cpu().numpy() 
            torch.sigmoid(predictions)

I get an error at saliency = predictions.grad.data.abs().squeeze() `, when I am trying to access the gradient data.

This is the full error code I get

UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. """Entry point for launching an IPython kernel. *** AttributeError: 'NoneType' object has no attribute 'data'

Is there a way of sorting this out?

albanD · October 8, 2020, 1:32pm

Hi,

This is warning, not an error right?
This is just to tell you that you access the .grad field of a Tensor whose .grad field will never be populated. You can find where the warning is raised to find where you do this access.

julianolm · February 26, 2021, 11:04am

I was facing the same problem, but then realized (as @albanD already said) that I had not called loss.backward(), which populates the grad fields of the tensors. After doing it, the error went away.

ritwik.m07 · May 18, 2022, 8:40am

Hello
I want to add to what @albanD said
Indeed, transferring the newly created random tensor to GPU inside the class init() function results in this warning. Moreover, if you want to see the gradients attached to that random tensor, you will get None. Solution to that problem is to transfer your random tensor to GPU only during the forward pass, NOT in the class definition itself.

To replicate the problem you can run this code (let us call this problematic code)

import torch
import torch.nn as nn
mdev = torch.device("cuda:0")
torch.manual_seed(123)

class mclass(torch.nn.Module):
    def __init__(self):
        super(mclass, self).__init__()
        self.fc1 = nn.Linear(10, 5)
        self.a1 = nn.Sigmoid()
        self.fc2 = nn.Linear(5,1)
        self.a2 = nn.Sigmoid()
        self.mw = torch.rand(5,5, requires_grad=True).to(mdev)
    def forward(self,b):
        b = self.a1(self.fc1(b))
        b = b @ self.mw
        b = self.a2(self.fc2(b))
        return b

# ----- RUN -----        
tmodel = mclass().to(mdev)

a = torch.round(torch.rand(4,1)).to(mdev)
b = torch.rand(4,10).to(mdev)

CE = torch.nn.BCELoss()

pred = tmodel.forward(b)
loss = CE(pred,a)
print(loss)
loss.backward()
print('Gradients attached on my random tensor are:',tmodel.mw.grad)

You will get the the above user warning with None gradients.

The solution is to change your class definition like the following:

class mclass(torch.nn.Module):
    def __init__(self):
        super(mclass, self).__init__()
        self.fc1 = nn.Linear(10, 5)
        self.a1 = nn.Sigmoid()
        self.fc2 = nn.Linear(5,1)
        self.a2 = nn.Sigmoid()
        self.mw = torch.rand(5,5, requires_grad=True) # <----- 
    def forward(self,b):
        b = self.a1(self.fc1(b))
        b = b @ self.mw.to(mdev) # <----- 
        b = self.a2(self.fc2(b))
        return b

Now you can see the attached gradients with no warning.

However, if you are using CPU instead of GPU then you will see the attached gradients and no user warning even on the problematic code. Change mdev = torch.device("cuda:0") to mdev = torch.device("cpu") in the code and run; it will run normally. I do not know why that is happening.

skye95git · May 4, 2023, 8:28am

If .to() operation will make foo be not a leaf, when set requires_grad=True, how to move data to device to make foo still be a leaf?

ptrblck · May 4, 2023, 9:14am

Specify the device attribute directly during the tensor creation, register the parameter first and move it to the device by calling to on the parent module, or call .requires_grad_() after moving the tensor to the device.