when you do a forward pass for a particular operation, where some of the inputs have a requires_grad=True
, PyTorch needs to hold onto some of the inputs or intermediate values so that the backwards can be computed.
For example: If you do y = x * x
(y = x squared), then the gradient is dl / dx = grad_output * 2 * x
. Here, if x
requires_grad
, then we hold onto x
to compute the backward pass.
Take an example of:
y = x ** 2
z = y ** 2
del y
Over here, even if y
is deleted out of Python scope, the function z = square(y)
which is in the autograd graph (which effectively is z.grad_fn
) holds onto y
and in turn x
.
So you might not have visibility into it via the GC, but it still exists until z
is deleted out of python scope