when you do a forward pass for a particular operation, where some of the inputs have a `requires_grad=True`

, PyTorch needs to hold onto some of the inputs or intermediate values so that the backwards can be computed.

For example: If you do `y = x * x`

(y = x squared), then the gradient is `dl / dx = grad_output * 2 * x`

. Here, if `x`

`requires_grad`

, then we hold onto `x`

to compute the backward pass.

Take an example of:

```
y = x ** 2
z = y ** 2
del y
```

Over here, even if `y`

is deleted out of Python scope, the function `z = square(y)`

which is in the autograd graph (which effectively is `z.grad_fn`

) holds onto `y`

and in turn `x`

.

So you might not have visibility into it via the GC, but it still exists until `z`

is deleted out of python scope