Grad is None even when requires_grad=True

I run into this wired behavior of autograd when try to initialize weights.

Here is a minimal case:

import torch

print("Trial 1: with python float")
w = torch.randn(3,5,requires_grad = True) * 0.01

x = torch.randn(5,4,requires_grad = True)

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)

print("w.grad",w.grad)
print("x.grad",x.grad)

print("Trial 2: with on-the-go torch scalar")
w = torch.randn(3,5,requires_grad = True) * torch.tensor(0.01,requires_grad=True)

x = torch.randn(5,4,requires_grad = True)

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)

print("w.grad",w.grad)
print("x.grad",x.grad)

print("Trial 3: with named torch scalar")
t = torch.tensor(0.01,requires_grad=True)
w = torch.randn(3,5,requires_grad = True) * t

x = torch.randn(5,4,requires_grad = True)

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)

print("w.grad",w.grad)
print("x.grad",x.grad)

The output should be

Trial 1: with python float
w.requires_grad: True
x.requires_grad: True
w.grad None
x.grad tensor([[-0.0267, -0.0267, -0.0267, -0.0267],
        [ 0.0040,  0.0040,  0.0040,  0.0040],
        [-0.0034, -0.0034, -0.0034, -0.0034],
        [-0.0010, -0.0010, -0.0010, -0.0010],
        [ 0.0215,  0.0215,  0.0215,  0.0215]])
Trial 2: with on-the-go torch scalar
w.requires_grad: True
x.requires_grad: True
w.grad None
x.grad tensor([[-0.0028, -0.0028, -0.0028, -0.0028],
        [ 0.0130,  0.0130,  0.0130,  0.0130],
        [-0.0027, -0.0027, -0.0027, -0.0027],
        [ 0.0054,  0.0054,  0.0054,  0.0054],
        [ 0.0133,  0.0133,  0.0133,  0.0133]])
Trial 3: with named torch scalar
w.requires_grad: True
x.requires_grad: True
w.grad None
x.grad tensor([[ 0.0227,  0.0227,  0.0227,  0.0227],
        [ 0.0101,  0.0101,  0.0101,  0.0101],
        [-0.0200, -0.0200, -0.0200, -0.0200],
        [-0.0052, -0.0052, -0.0052, -0.0052],
        [-0.0031, -0.0031, -0.0031, -0.0031]])

You can see that even with tensors’ requires_grad being True, their grad still is None. Is this a supposed behavior?

I know that adding w.requires_grad_() can solve this problem, but shouldn’t autograd at least change the tensor’s requires_grad to false?

2 Likes

I think, this touches upon the concept of leaf variables and intermediate variables.
As far as I could see, in all three cases, w is an intermediate variable and the gradients will be accumulated in torch.randn(..., requires_grad=True) (which is one of the roots of the computation tree) instance. All the intermediate variables’ gradient (including w) is removed during the backward() call. If you want to retain those gradients, call w.retain_grad() before calling backward().

17 Likes

Thanks for the reply!

This clear things up, now I understand why w doesn’t get a grad. :grinning:

I’m getting None for grad after specifying retain_grad().
Can anyone help?

import torch
import numpy as np

# Temp, rain, hum
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

# apples and oranges
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

# weights and Biases
w = torch.randn(2,3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)


#  Define model
def model(x):
  return x @ w.t() + b

predit = model(inputs)

def mse(t1, t2):
  diff = t1 - t2
  return torch.sum(diff * diff)/ diff.numel()

loss = mse(predit, targets)
loss


# Compute gradient
w.retain_grad()
b.retain_grad()
loss.backward()

print(w)
print(w.grad)

Hi,

Running this code does print a non-None gradient. You can check it in this colab notebook: https://colab.research.google.com/drive/1dXUTa5bx2pfewb52KIGpftXu66k7qeTj?usp=sharing
How did you install pytorch? Which version is it? Do you do anything else?

@albanD Thanks, I was also able to print grad I’m running torch =>1.6.0 and i’m executing in colab. I reinstalled the pytorch then I was able to execute

1 Like

I cannot understand that why gradients of w should be removed during the backward()?
we want to update the w (our model’s parameter) and we need it’s gradients.
Am I wrong?

I also had a similar problem. Although all the variables had require_grad = True, the gradients was None

After scrutinizing my code, I realized that in one of my modules, there was with torch.no_grad: in forward method that disabled the gradient computation. Therefore, I removed it and the problem was solved.

I am also curious. Is there a answer for this?