Multiple call to backward still works when retain_graph=false, why?

The tutorial says( http://pytorch.org/tutorials/beginner/former_torchies/autograd_tutorial.html#gradients): so if you even want to do the backward on some part of the graph twice, you need to pass in retain_variables = True during the first pass

The example it gives is:

import torch
from torch.autograd import Variable
x = Variable(torch.ones(2, 2), requires_grad=True)
y = x + 2
y.backward(torch.ones(2, 2), retain_graph=True)
print(x.grad)
z = y * y
print(z)
gradient = torch.randn(2, 2)
y.backward(gradient)
print(x.grad)

But when I try this code with retain_graph=True and retain_graph=False, they both works with no error, and the gradients are corrects.
Anything wrong with the example?

Thanks!

1 Like

In this specific case you do not need retain_graph=True, but in general you may need it. As you compute the forward pass, PyTorch saves variables that will be needed to compute the gradients in the backward pass. For example, z = y * y needs to save the value of y, because dz/dy = 2*y (or y + y). However, y = x + 2 doesn’t need to save anything because dy/dx = 1 which doesn’t depend on x.

When you call backwards() with retain_graph=False (or without specifying it), the automatic differentiation engine frees the saved variables as it computes the gradients. If you call backwards() again, it will fail with an exception if it needs any freed saved variables. If it doesn’t need any saved variables, like in your example, then it will succeed, but you shouldn’t rely on this behavior.

If you change y = x + 2 to y = x * x, you will see an error:

Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
2 Likes

Thanks for the explanation!