How does Pytorch function architecturally?

As with most other auto-differentiation libraries, pytorch doesn’t require you to explicitly define a graph variable which records computations (computational graph).

I was wondering how does pytorch autograd accomplish this? Does a tensor produced as a result of a tensor operation store references to its children tensors? Or is there a global variable that holds all tensors which tensor handles can point to? Or some other way?

I don’t have a good knowledge of C++, so I can’t look into the codebase to understand it, but maybe there’s a good conceptual explanation that’s independent of the implementation language?

Thanks in advance :slight_smile: