Grad of all parameters are None even retain_grad() is used

Fireman · December 16, 2022, 1:24pm

I would like to check the grad of each parameter (logits[0], logits[1], logits[2], logits[3]) by this code:

import torch
import torch.nn as nn

logits = torch.tensor([0.0, 3.0, -2.0, 1.0], requires_grad=True)
softmax = nn.Softmax(dim=0)
probs = softmax(logits)
loss = -probs[3].log()
logits.retain_grad()
loss.backward()
print(logits[0].grad)
print(logits[1].grad)
print(logits[2].grad)
print(logits[3].grad)

However, even retain_grad() is used, the grads are all None. How can I solve this issue?

None
None
None
None
<ipython-input-78-362a14858a8e>:10: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:480.)
  print(logits[0].grad)
<ipython-input-78-362a14858a8e>:11: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:480.)
  print(logits[1].grad)
<ipython-input-78-362a14858a8e>:12: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:480.)
  print(logits[2].grad)
<ipython-input-78-362a14858a8e>:13: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:480.)
  print(logits[3].grad)

srishti-git1110 · December 16, 2022, 1:54pm

Hi,
You don’t need to use retain_grad() on logits in order to access its grad attribute. See the following code -

logits = torch.tensor([0.0, 3.0, -2.0, 1.0], requires_grad=True)
print(logits.is_leaf) # True hence retain_grad not required
softmax = nn.Softmax(dim=0)
probs = softmax(logits)

loss = -probs[3].log()
loss.backward()
print(logits.grad) # tensor([ 0.0418,  0.8390,  0.0057, -0.8865])

You can now index the tensor logits.grad to obtain what you are looking for.

The reason why the grad attribute of logits[0] etc. is None is because these aren’t leaf tensors as you are indexing a leaf tensor (logits) to obtain these which is a differentiable operation and hence the indexed values aren’t leaf anymore being the result of a differentiable operation. See -

logits[0].is_leaf # False
logits[0] # tensor(0., grad_fn=<SelectBackward0>) - has a grad_fn which means it's non-leaf