Grad is None even when requires_grad=True


(Li S H ) #1

I run into this wired behavior of autograd when try to initialize weights.

Here is a minimal case:

import torch

print("Trial 1: with python float")
w = torch.randn(3,5,requires_grad = True) * 0.01

x = torch.randn(5,4,requires_grad = True)

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)

print("w.grad",w.grad)
print("x.grad",x.grad)

print("Trial 2: with on-the-go torch scalar")
w = torch.randn(3,5,requires_grad = True) * torch.tensor(0.01,requires_grad=True)

x = torch.randn(5,4,requires_grad = True)

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)

print("w.grad",w.grad)
print("x.grad",x.grad)

print("Trial 3: with named torch scalar")
t = torch.tensor(0.01,requires_grad=True)
w = torch.randn(3,5,requires_grad = True) * t

x = torch.randn(5,4,requires_grad = True)

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)

print("w.grad",w.grad)
print("x.grad",x.grad)

The output should be

Trial 1: with python float
w.requires_grad: True
x.requires_grad: True
w.grad None
x.grad tensor([[-0.0267, -0.0267, -0.0267, -0.0267],
        [ 0.0040,  0.0040,  0.0040,  0.0040],
        [-0.0034, -0.0034, -0.0034, -0.0034],
        [-0.0010, -0.0010, -0.0010, -0.0010],
        [ 0.0215,  0.0215,  0.0215,  0.0215]])
Trial 2: with on-the-go torch scalar
w.requires_grad: True
x.requires_grad: True
w.grad None
x.grad tensor([[-0.0028, -0.0028, -0.0028, -0.0028],
        [ 0.0130,  0.0130,  0.0130,  0.0130],
        [-0.0027, -0.0027, -0.0027, -0.0027],
        [ 0.0054,  0.0054,  0.0054,  0.0054],
        [ 0.0133,  0.0133,  0.0133,  0.0133]])
Trial 3: with named torch scalar
w.requires_grad: True
x.requires_grad: True
w.grad None
x.grad tensor([[ 0.0227,  0.0227,  0.0227,  0.0227],
        [ 0.0101,  0.0101,  0.0101,  0.0101],
        [-0.0200, -0.0200, -0.0200, -0.0200],
        [-0.0052, -0.0052, -0.0052, -0.0052],
        [-0.0031, -0.0031, -0.0031, -0.0031]])

You can see that even with tensors’ requires_grad being True, their grad still is None. Is this a supposed behavior?

I know that adding w.requires_grad_() can solve this problem, but shouldn’t autograd at least change the tensor’s requires_grad to false?


(Arul) #2

I think, this touches upon the concept of leaf variables and intermediate variables.
As far as I could see, in all three cases, w is an intermediate variable and the gradients will be accumulated in torch.randn(..., requires_grad=True) (which is one of the roots of the computation tree) instance. All the intermediate variables’ gradient (including w) is removed during the backward() call. If you want to retain those gradients, call w.retain_grad() before calling backward().


None Grad while I set requires_grad=True
(Li S H ) #3

Thanks for the reply!

This clear things up, now I understand why w doesn’t get a grad. :grinning: