Will Torch automatically free activation memory?

Will Torch free the gpu memory of activation values of a specific layer after its backward has done? Since the activation values are no more needed after calculating gradients.

Yes, the intermediate tensors will be removed after the backward() operation is done, if retain_graph=True is not used.
This is also the reason, why a second backward() call right after the first one will raise an error (if the intermediate tensors were needed to calculate the gradients).

2 Likes

The intermediate tensors will be removed after the backward() operation is done

That’s awesome.
Could you please show me in which file this “remove operation” is executed?

I don’t know where exactly the intermediates are freed, but @albanD might know. :slight_smile:

Hey,

Each “Node” in the graph implements a special functions to release all the resources it can here: pytorch/function.h at a46d56f988505547b0779838a022970b79123b3c · pytorch/pytorch · GitHub
This is called from the execution engine when retain_graph=False here: pytorch/engine.cpp at a46d56f988505547b0779838a022970b79123b3c · pytorch/pytorch · GitHub

2 Likes

That’s incredible. Thank you!