What’s difference between retain_graph and retain_variables for backward?
The doc says when we need to backpropagate twice, we need set retain_variables=True.
But I have tried example below:
f = Variable(torch.Tensor([2,3]), requires_grad=True)
g = f[0] + f[1]
g.backward()
print(f.grad)
g.backward()
print(f.grad)
It works well but I don’t set retain_variables=True, who can tell me why?
And I’m very confused, doc says it will free buffer after the first backpropagation when we don’t set retain_variables=True, but why not recreate buffer when calculating gradients for second time?
When you run the forward pass, the input values are saved, so that when you run the backward pass, the gradients can be properly calculated. Once the input values have been discarded, the gradients can no longer be computed.
Some operations, such as addition, do not require the inputs to be saved in order to properly calculate the gradients. Try multiplication instead.
f = Variable(torch.Tensor([2,3]), requires_grad=True)
g = f[0] * f[1]
g.backward()
f.grad
g.backward()
f.grad