Why is weights.grad None after trying to adjust weights?

TenYa · June 6, 2022, 1:00pm

Hello

I’m in my first week of learning PyTorch and currently trying gradient descent. I’ve run into a problem where when I try to adjust the weights of my model, I get an error on my first attempt and the gradient of my weights becomes None. As far as I could see, it’s the calculation itself that causes the problem, as the gradient is still a tensor until right before the calculation.

I don’t know if it’s necessary, but here are my model and my mean-square-error function (can’t put more than one media in a post)

weights = torch.randn(2,3, requires_grad=True)
biases = torch.randn(2, requires_grad=True)

def model(x):
return x @ weights.T + biases

def mse(pred, target):
diff = pred - target
diff_square = diff**2
loss = torch.mean(diff_square)
return loss

Any help is appreciated!

AlphaBetaGamma96 · June 6, 2022, 1:34pm

I’m not 100% sure but it seems you’re not populating your .grad attribute when backpropagating your loss. It might happen due to you calling .backward() directly on your loss function and not the loss itself. Regardless, the following minimal reproducible example will population the .grad attributes and return gradients,

import torch

weights = torch.randn(2,3, requires_grad=True)
biases = torch.randn(2, requires_grad=True)

def model(x):
  return x @ weights.t() + biases

def mse(pred, target):
  diff = pred - target
  diff_square = diff**2
  loss = torch.mean(diff_square)
  return loss
  
inputs=torch.randn(10,3) #random input/output data
targets=torch.randn(10,2)

loss = mse(model(inputs), targets) #define scalar loss
loss.backward() #backprop here

print(weights.grad)
print(biases.grad)
"""
returns 
tensor([[ 0.0731,  0.0566,  0.0354],
        [-1.4290,  0.9217,  2.7733]])
tensor([-0.3319,  1.3362])
"""

TenYa · June 6, 2022, 1:52pm

I’ve checked the gradient on every step right until the adjustment step and it is still there. But as soon as I execute the subtraction “weights = weights - (1e-5 * weights.grad)” it becomes None, while weights gets calculated correctly. Not calling backward() on my function is definitely a better idea though and better to read.

AlphaBetaGamma96 · June 6, 2022, 2:02pm

What’s probably happening then is that when you re-define weights as the difference of two tensors it most likely removes the .grad attribute which is why it gets set to None.

When updating tensors via the .grad attribute you’ll want to it within a torch.no_grad() context manager (which removes the UserWarning) but still has the .grad = None error. The .grad attribute is populated when you call .backward() so if you want the .grad again you’ll need to recompute the gradient.

with torch.no_grad():
  weights.retain_grad
  print("grad (before): ",weights.grad)
  weights = weights - weights.grad
  print(weights)
  print("grad (after):  ",weights.grad)

"""
returns 
grad (before):  tensor([[-0.1729, -0.9709,  1.2458],
        [ 0.2753, -0.6027,  0.1230]])
tensor([[ 0.2790,  0.4543, -0.0610],
        [ 1.4604,  0.2984, -0.2316]])
grad (after):   None
"""

TenYa · June 6, 2022, 3:46pm

I’m not sure I understood you correctly, but I tried calling backward before resetting the gradients and it still gives the same error.

AlphaBetaGamma96 · June 6, 2022, 4:13pm

You need to call loss.backward() before uses .grad as it defaults to None. Then when you update the weights the .grad attribute gets removed so there’s no need to call .grad.zero_() as it being None is the same as zeroing that Tensor.

If you want to do gradient descent, you can use the torch.optim.SGD class which will perform gradient descent too.