I run into this wired behavior of autograd when try to initialize weights.

Here is a minimal case:

``````import torch

print("Trial 1: with python float")
w = torch.randn(3,5,requires_grad = True) * 0.01

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("Trial 2: with on-the-go torch scalar")

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("Trial 3: with named torch scalar")
w = torch.randn(3,5,requires_grad = True) * t

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

``````

The output should be

``````Trial 1: with python float
[ 0.0040,  0.0040,  0.0040,  0.0040],
[-0.0034, -0.0034, -0.0034, -0.0034],
[-0.0010, -0.0010, -0.0010, -0.0010],
[ 0.0215,  0.0215,  0.0215,  0.0215]])
Trial 2: with on-the-go torch scalar
[ 0.0130,  0.0130,  0.0130,  0.0130],
[-0.0027, -0.0027, -0.0027, -0.0027],
[ 0.0054,  0.0054,  0.0054,  0.0054],
[ 0.0133,  0.0133,  0.0133,  0.0133]])
Trial 3: with named torch scalar
x.grad tensor([[ 0.0227,  0.0227,  0.0227,  0.0227],
[ 0.0101,  0.0101,  0.0101,  0.0101],
[-0.0200, -0.0200, -0.0200, -0.0200],
[-0.0052, -0.0052, -0.0052, -0.0052],
[-0.0031, -0.0031, -0.0031, -0.0031]])

``````

You can see that even with tensorsâ€™ requires_grad being True, their grad still is None. Is this a supposed behavior?

I know that adding `w.requires_grad_()` can solve this problem, but shouldnâ€™t autograd at least change the tensorâ€™s requires_grad to false?

2 Likes

I think, this touches upon the concept of leaf variables and intermediate variables.
As far as I could see, in all three cases, `w` is an intermediate variable and the gradients will be accumulated in `torch.randn(..., requires_grad=True)` (which is one of the roots of the computation tree) instance. All the intermediate variablesâ€™ gradient (including `w`) is removed during the `backward()` call. If you want to retain those gradients, call `w.retain_grad()` before calling `backward()`.

17 Likes

This clear things up, now I understand why w doesnâ€™t get a grad.

Can anyone help?

``````import torch
import numpy as np

# Temp, rain, hum
inputs = np.array([[73, 67, 43],
[91, 88, 64],
[87, 134, 58],
[102, 43, 37],
[69, 96, 70]], dtype='float32')

# apples and oranges
targets = np.array([[56, 70],
[81, 101],
[119, 133],
[22, 37],
[103, 119]], dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

# weights and Biases
print(w)
print(b)

#  Define model
def model(x):
return x @ w.t() + b

predit = model(inputs)

def mse(t1, t2):
diff = t1 - t2

loss = mse(predit, targets)
loss

loss.backward()

print(w)
``````

Hi,

Running this code does print a non-None gradient. You can check it in this colab notebook: https://colab.research.google.com/drive/1dXUTa5bx2pfewb52KIGpftXu66k7qeTj?usp=sharing
How did you install pytorch? Which version is it? Do you do anything else?

@albanD Thanks, I was also able to print grad Iâ€™m running torch =>1.6.0 and iâ€™m executing in colab. I reinstalled the pytorch then I was able to execute

1 Like

I cannot understand that why gradients of w should be removed during the backward()?
we want to update the w (our modelâ€™s parameter) and we need itâ€™s gradients.
Am I wrong?

I also had a similar problem. Although all the variables had `require_grad = True`, the gradients was `None`

After scrutinizing my code, I realized that in one of my modules, there was `with torch.no_grad:` in forward method that disabled the gradient computation. Therefore, I removed it and the problem was solved.

I am also curious. Is there a answer for this?