[Possible BUG] Strange Behavior of backward

Hi there,

I have run these pieces of code, but the results are strange:

x = torch.zeros(10, 300, requires_grad=True)
loss = torch.max(x)
loss.backward(retain_graph=True)
x.grad

Here, x.grad is correct.

Another script.

x = torch.zeros(10, 300, requires_grad=True).to(device)
loss = torch.max(x)
loss.backward(retain_graph=True)
x.grad

x.grad becomes None.

Why? Thanks!

I am unable to run the first script, because x.backward throws an error saying “grad can be implicitly created only for scalar outputs”

My bad. I have missed something. Could you please check it again?

Please just refresh the webpage.

In the first example x is a leaf tensor, since it is a tensor that you directly created. By default, gradients are computed only with respect to leaf tensors, so x.grad has actual numbers in it.

In the second example x is not a leaf tensor, since it is not a tensor that you directly created: it is a tensor that you obtained by calling a function (to) on another tensor. By default, gradients are computed only with respect to leaf tensors, so x.grad is None.

Here is some code that may help make this distinction clearer:

x = torch.zeros(10, 300, requires_grad=True)
print(x.grad)
y = x.to(device)
loss = torch.max(y)
loss.backward(retain_graph=True)
print(y.grad)
print(x.grad)

You can make PyTorch compute gradients with respect to a non-leaf tensor, by calling retain_grad() on that tensor. Here is the same example as above, with an extra call to y.retain_grad():

x = torch.zeros(10, 300, requires_grad=True)
print(x.grad)
y = x.to(device)
y.retain_grad()
loss = torch.max(y)
loss.backward(retain_graph=True)
print(y.grad)
print(x.grad)
1 Like

Thank you for the clarification. Now I know what happens in .to() method.

1 Like