Grad is None even when requires_grad=True

li012589 · November 17, 2018, 7:08am

I run into this wired behavior of autograd when try to initialize weights.

Here is a minimal case:

import torch

print("Trial 1: with python float")
w = torch.randn(3,5,requires_grad = True) * 0.01

x = torch.randn(5,4,requires_grad = True)

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)

print("w.grad",w.grad)
print("x.grad",x.grad)

print("Trial 2: with on-the-go torch scalar")
w = torch.randn(3,5,requires_grad = True) * torch.tensor(0.01,requires_grad=True)

x = torch.randn(5,4,requires_grad = True)

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)

print("w.grad",w.grad)
print("x.grad",x.grad)

print("Trial 3: with named torch scalar")
t = torch.tensor(0.01,requires_grad=True)
w = torch.randn(3,5,requires_grad = True) * t

x = torch.randn(5,4,requires_grad = True)

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)

print("w.grad",w.grad)
print("x.grad",x.grad)

The output should be

Trial 1: with python float
w.requires_grad: True
x.requires_grad: True
w.grad None
x.grad tensor([[-0.0267, -0.0267, -0.0267, -0.0267],
        [ 0.0040,  0.0040,  0.0040,  0.0040],
        [-0.0034, -0.0034, -0.0034, -0.0034],
        [-0.0010, -0.0010, -0.0010, -0.0010],
        [ 0.0215,  0.0215,  0.0215,  0.0215]])
Trial 2: with on-the-go torch scalar
w.requires_grad: True
x.requires_grad: True
w.grad None
x.grad tensor([[-0.0028, -0.0028, -0.0028, -0.0028],
        [ 0.0130,  0.0130,  0.0130,  0.0130],
        [-0.0027, -0.0027, -0.0027, -0.0027],
        [ 0.0054,  0.0054,  0.0054,  0.0054],
        [ 0.0133,  0.0133,  0.0133,  0.0133]])
Trial 3: with named torch scalar
w.requires_grad: True
x.requires_grad: True
w.grad None
x.grad tensor([[ 0.0227,  0.0227,  0.0227,  0.0227],
        [ 0.0101,  0.0101,  0.0101,  0.0101],
        [-0.0200, -0.0200, -0.0200, -0.0200],
        [-0.0052, -0.0052, -0.0052, -0.0052],
        [-0.0031, -0.0031, -0.0031, -0.0031]])

You can see that even with tensors’ requires_grad being True, their grad still is None. Is this a supposed behavior?

I know that adding w.requires_grad_() can solve this problem, but shouldn’t autograd at least change the tensor’s requires_grad to false?

InnovArul · November 17, 2018, 7:35am

I think, this touches upon the concept of leaf variables and intermediate variables.
As far as I could see, in all three cases, w is an intermediate variable and the gradients will be accumulated in torch.randn(..., requires_grad=True) (which is one of the roots of the computation tree) instance. All the intermediate variables’ gradient (including w) is removed during the backward() call. If you want to retain those gradients, call w.retain_grad() before calling backward().

li012589 · November 17, 2018, 8:17am

Thanks for the reply!

This clear things up, now I understand why w doesn’t get a grad.

Rohit_Deshpande · August 27, 2020, 3:35pm

I’m getting None for grad after specifying retain_grad().
Can anyone help?

import torch
import numpy as np

# Temp, rain, hum
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

# apples and oranges
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

# weights and Biases
w = torch.randn(2,3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)


#  Define model
def model(x):
  return x @ w.t() + b

predit = model(inputs)

def mse(t1, t2):
  diff = t1 - t2
  return torch.sum(diff * diff)/ diff.numel()

loss = mse(predit, targets)
loss


# Compute gradient
w.retain_grad()
b.retain_grad()
loss.backward()

print(w)
print(w.grad)

albanD · August 27, 2020, 3:59pm

Hi,

Running this code does print a non-None gradient. You can check it in this colab notebook: https://colab.research.google.com/drive/1dXUTa5bx2pfewb52KIGpftXu66k7qeTj?usp=sharing
How did you install pytorch? Which version is it? Do you do anything else?

Rohit_Deshpande · August 27, 2020, 4:53pm

@albanD Thanks, I was also able to print grad I’m running torch =>1.6.0 and i’m executing in colab. I reinstalled the pytorch then I was able to execute

Sharifi · September 13, 2020, 1:26pm

I cannot understand that why gradients of w should be removed during the backward()?
we want to update the w (our model’s parameter) and we need it’s gradients.
Am I wrong?

Sharifi · September 13, 2020, 3:49pm

I also had a similar problem. Although all the variables had require_grad = True, the gradients was None

After scrutinizing my code, I realized that in one of my modules, there was with torch.no_grad: in forward method that disabled the gradient computation. Therefore, I removed it and the problem was solved.

Yangmin · November 24, 2021, 3:43am

I am also curious. Is there a answer for this?

jiuluan_lv · September 22, 2022, 12:36pm

For Tensors that have requires_grad which is True, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so grad_fn is None.

In this line:

w = torch.randn(3,5,requires_grad = True) * 0.01

We could also wirte this which is the same as above:

temp = torch.randn(3,5,requires_grad = True)
w =  temp * 0.01

Use method is_leaf to print whether the variable is a leaf variable or not

So,that is the reason why w does not have grad