What's difference between retain_graph and retain_variables?

What’s difference between retain_graph and retain_variables for backward?

The doc says when we need to backpropagate twice, we need set retain_variables=True.

But I have tried example below:

f = Variable(torch.Tensor([2,3]), requires_grad=True)
g = f[0] + f[1]
g.backward()
print(f.grad)
g.backward()
print(f.grad)

It works well but I don’t set retain_variables=True, who can tell me why?

And I’m very confused, doc says it will free buffer after the first backpropagation when we don’t set retain_variables=True, but why not recreate buffer when calculating gradients for second time?

3 Likes

I asked the same question a few days ago.

I have two other questions above, could you give me answer?

When you run the forward pass, the input values are saved, so that when you run the backward pass, the gradients can be properly calculated. Once the input values have been discarded, the gradients can no longer be computed.

Some operations, such as addition, do not require the inputs to be saved in order to properly calculate the gradients. Try multiplication instead.

f = Variable(torch.Tensor([2,3]), requires_grad=True)
g = f[0] * f[1]
g.backward()
f.grad
g.backward()
f.grad
4 Likes

Why addition operation don’t need buffer, could you explain some details?

You need to revise your calculus.

If f(x) = x + w then the gradient of f with respect to w is 1. In this case the gradient doesn’t depend on the inputs.

If f(x) = x * w then the gradient of f with respect to w is x. In this case, we need to save the input value.

2 Likes