import torch
from torch.autograd import Variable
x = Variable(torch.ones(2, 2), requires_grad=True)
y = x + 2
y.backward(torch.ones(2, 2), retain_graph=True)
print(x.grad)
z = y * y
print(z)
gradient = torch.randn(2, 2)
y.backward(gradient)
print(x.grad)

But when I try this code with retain_graph=True and retain_graph=False, they both works with no error, and the gradients are corrects.
Anything wrong with the example?

In this specific case you do not need retain_graph=True, but in general you may need it. As you compute the forward pass, PyTorch saves variables that will be needed to compute the gradients in the backward pass. For example, z = y * y needs to save the value of y, because dz/dy = 2*y (or y + y). However, y = x + 2 doesnâ€™t need to save anything because dy/dx = 1 which doesnâ€™t depend on x.

When you call backwards() with retain_graph=False (or without specifying it), the automatic differentiation engine frees the saved variables as it computes the gradients. If you call backwards() again, it will fail with an exception if it needs any freed saved variables. If it doesnâ€™t need any saved variables, like in your example, then it will succeed, but you shouldnâ€™t rely on this behavior.

If you change y = x + 2 to y = x * x, you will see an error:

Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.