Will Torch free the gpu memory of activation values of a specific layer after its backward has done? Since the activation values are no more needed after calculating gradients.
Yes, the intermediate tensors will be removed after the backward()
operation is done, if retain_graph=True
is not used.
This is also the reason, why a second backward()
call right after the first one will raise an error (if the intermediate tensors were needed to calculate the gradients).
The intermediate tensors will be removed after the
backward()
operation is done
That’s awesome.
Could you please show me in which file this “remove operation” is executed?
Hey,
Each “Node” in the graph implements a special functions to release all the resources it can here: pytorch/function.h at a46d56f988505547b0779838a022970b79123b3c · pytorch/pytorch · GitHub
This is called from the execution engine when retain_graph=False
here: pytorch/engine.cpp at a46d56f988505547b0779838a022970b79123b3c · pytorch/pytorch · GitHub
That’s incredible. Thank you!