One question about inplemention of Node class of AD module

I am studying the source code of pytorch automatic differentiation. I have a question to ask. For the Node class in variable.h, as the comment says, in order to avoid circular references in leaf nodes, there is not only one grad_fn on the Node, but also a weak_ptr representing AccumulateGrad. Why not use grad_fn for both, and then add a weak reference to the leaf tensor in the implementation of AccumulateGrad? In this way, can it be unintegrated?