Is `retain_grad`

orthogonal to `retain_graph`

? I am a bit confused on their difference. My current understanding is that `retain_graph`

retains `.grad_fn`

attributes, while `retain_grad`

retains `.grad`

attributes.

`retain_graph`

argument when set to `True`

during a backward call on a tensor `x`

causes autograd NOT to agressively free the saved references to the intermediate tensors in the graph of `x`

that are required for the gradient computation of `x`

wrt some tensor.

`y.retain_grad()`

is used to populate the `grad`

attribute of `y`

, which is a non-leaf tensor, when a `.backward()`

call is made – this is the non-default behaviour.

Where are these references stored? Are they only implicitly stored in the function `grad_fn`

?