UserWarning pops up only in Debug mode? The .grad attribute of a Tensor that is not a leaf Tensor is being accessed

Hi there, I’m learning pytorch (with VS Code) and facing an odd warning.

If I run my codes without breakpoints, it works just fine. However, when I set some breakpoints and run in debug mode, SOMETIMES this userwaring pops up:


C:\ProgramData\Anaconda3\envs\Python_37_envs\lib\site-packages\torch\tensor.py:746: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won’t be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
warnings.warn("The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad "

Even if I keep the same code and same breakpoints, the warning pops up occasionally(20% chance, give or take, and it pops up seconds after the breakpoints pause). I found one similar post

but it does not answer my question since there is no .to() operation in my codes.

Any tips on how to solve it? Thanks.

1 Like

Hi,

Does it happen when you try to access the .grad field of a given Tensor? If so, how was this Tensor created?

As from tutorial , I was using b.backward() function, it throws same err when printing b.grad
My code is:

a = torch.randn(2,2)
a = ((a*3)/ (a-1))
a.requires_grad_(True)
b = (a * a).sum()
b.backward()
print(b.grad)

Throws error

None
/usr/local/lib/python3.6/dist-packages/torch/tensor.py:746: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
  warnings.warn("The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad "

help, Thank you for response :slight_smile: .

Hi,

As you can see in the warning message, it is because you call .grad on a Tensor that is not a leaf (meaning that it has some gradient history). And because it is not a leaf, the backward call won’t populate it’s .grad field so your code should most likely not be doing that.

In your example, a is a leaf and so will have a .grad field.
If you need the gradient for b, you need to set b.retain_grad() as mentioned in the warning.

2 Likes

Thank you :slight_smile: ,
As you told i tried and may be it worked, but

a = torch.randn(2,2)
a = ((a*3)/ (a-1))
a.requires_grad_(True)
b = (a * a).sum()
b.backward()
# print(b.grad)
b.retain_grad()
print(b.grad)

it prints nothing.

But when i used b.retain_grad() before b.backward() it seems working now , although i am thinking what it is printing


b.retain_grad()
b.backward()
# print(b.grad)
print(b.grad)

prints tensor(1.) may be this is right.

You can check the doc, .retain_grad() means that it will save the .grad field when doing backward. So you have to call it before the .backward() for it to be useful.

prints tensor(1.) may be this is right.

Yes this is expected.
b is the output so its gradient is the default backward value we use for scalar: 1.

1 Like

oh , I am slowly getting over it.
Thanks for kind answer.

Hi, thanks for the reply.
I got this warning when I was trying to debug this project:

I just added some codes which prints out the size of the intermediates (in models.py) and I did not change the work flow of main.

x = torch.rand(2, 32000)
nnet = TasNet()
x = nnet(x)
s1 = x[0]
print(s1.shape)

It just creats a random x and iputs to the conv_tasnet then prints the output. I think it does not try to access the .grad field of any given Tensor cause I did not find the .backward operation. I don’t know how this warning is triggered.

The .grad attribute of a Tensor that is not a leaf Tensor is being accessed.
Its .grad attribute won’t be populated during autograd.backward().
If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor.

If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead.

my code working well, after many iteration and epoch, but suddenlly i got above warnning, i think the code worked,

Im predict that problem because this

"If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. "

why this happen? why suddenlly access ? because the code working well after many epoch and iteration?

what wrong and i should do?

Hi, albanD,
I also have a problem with gradient back propagation.
My example is as follows(Correct code):

w = torch.tensor([[[1.,2.,3.],[4.,5.,6.],[7.,8.,9.],[10.,11.,12.]]], requires_grad=True)
x = torch.tensor([4.], requires_grad=True) 

a = torch.add(w, x)     # a = w + x
b = torch.add(w, 1)     # b = w + 1
y = torch.mul(a, b)     # y = a * b

y.backward(torch.ones_like(w))   
print(w.grad)   

The right result is:

tensor([[[ 7.,  9., 11.],
         [13., 15., 17.],
         [19., 21., 23.],
         [25., 27., 29.]]])

But,When I want to change the shape of the variable W from [1,4,3] to [4,3], a warning occurs and the output is none
As shown below:

w = torch.tensor([[[1.,2.,3.],[4.,5.,6.],[7.,8.,9.],[10.,11.,12.]]], requires_grad=True)
w=w.squeeze()
x = torch.tensor([4.], requires_grad=True) 

a = torch.add(w, x)     # a = w + x
b = torch.add(w, 1)     # b = w + 1
y = torch.mul(a, b)     # y = a * b

y.backward(torch.ones_like(w))   
print(w.grad)  

The warning:

UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. 
Its .grad attribute won't be populated during autograd.backward(). 
If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. 
If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead.

My question is Can’t I change the shape of the tensor in the process of gradient back propagation? No matter it’s view, reshape, or squeeze, etc
Waiting for your reply

Hi,

The only problem here is that you change the python variable “w” to point to the new squeezed Tensor. So it doesn’t point to the Tensor that you created with requires_grad=True anymore and so the Tensor it now points to has a .grad field that is not populated.
You just need to do w_new = w.squeeze() and use w_new in your computations and the w.grad will still be populated as you expect.

1 Like

Thank you for your guidance,It works.I may have understood what you mean,
In the process of back propagation, the state of the original node should not be changed…
w=w.squeeze()will change w’s requires_grad=True become to grad_fn=<SqueezeBackward0>.I’ll remember to look at the state of the variable first.
Finally, thank you again,have a nice day.