retain_grad orthogonal to
retain_graph? I am a bit confused on their difference. My current understanding is that
.grad_fn attributes, while
retain_graph argument when set to
True during a backward call on a tensor
x causes autograd NOT to agressively free the saved references to the intermediate tensors in the graph of
x that are required for the gradient computation of
x wrt some tensor.
y.retain_grad() is used to populate the
grad attribute of
y, which is a non-leaf tensor, when a
.backward() call is made – this is the non-default behaviour.
the saved references to the intermediate tensors in the graph of
xthat are required for the gradient computation of
xwrt some tensor.
Where are these references stored? Are they only implicitly stored in the function