[Possible BUG] Strange Behavior of backward

phantom90 · September 9, 2021, 3:20pm

Hi there,

I have run these pieces of code, but the results are strange:

x = torch.zeros(10, 300, requires_grad=True)
loss = torch.max(x)
loss.backward(retain_graph=True)
x.grad

Here, x.grad is correct.

Another script.

x = torch.zeros(10, 300, requires_grad=True).to(device)
loss = torch.max(x)
loss.backward(retain_graph=True)
x.grad

x.grad becomes None.

Why? Thanks!

gphilip · September 9, 2021, 3:35pm

I am unable to run the first script, because x.backward throws an error saying “grad can be implicitly created only for scalar outputs”

phantom90 · September 9, 2021, 3:37pm

My bad. I have missed something. Could you please check it again?

phantom90 · September 9, 2021, 3:38pm

Please just refresh the webpage.

gphilip · September 9, 2021, 3:58pm

In the first example x is a leaf tensor, since it is a tensor that you directly created. By default, gradients are computed only with respect to leaf tensors, so x.grad has actual numbers in it.

In the second example x is not a leaf tensor, since it is not a tensor that you directly created: it is a tensor that you obtained by calling a function (to) on another tensor. By default, gradients are computed only with respect to leaf tensors, so x.grad is None.

Here is some code that may help make this distinction clearer:

x = torch.zeros(10, 300, requires_grad=True)
print(x.grad)
y = x.to(device)
loss = torch.max(y)
loss.backward(retain_graph=True)
print(y.grad)
print(x.grad)

You can make PyTorch compute gradients with respect to a non-leaf tensor, by calling retain_grad() on that tensor. Here is the same example as above, with an extra call to y.retain_grad():

x = torch.zeros(10, 300, requires_grad=True)
print(x.grad)
y = x.to(device)
y.retain_grad()
loss = torch.max(y)
loss.backward(retain_graph=True)
print(y.grad)
print(x.grad)

phantom90 · September 9, 2021, 4:49pm

Thank you for the clarification. Now I know what happens in .to() method.