I wrote a C++/CUDA extension to produce a gradient tensor, e.g.
L is the loss and
X is a tensor. When I call
X.backward(dLdX), the memory dLdX occupies (allocated in C++) will not get recycled. My question is: how can I tell the computational graph to recycle
dLdX once it is no longer needed during the execution of
How did you check, that the memory is not recycled? Was
torch.cuda.memory_allocated() after the operation higher and stayed this way?
If so, are you storing any references or in your extension?
Thank you so much for the reply!
If I call
X.backward(dLdX), it does release
dLdX's GPU memory. But I am wondering if there is a way to do that in the middle of the backward pass. The analogy would be as if X is an intermediate variable whose gradient gets released as soon as it is no longer needed for the rest of the backprop. Am I making sense?
I made another post after I realized I was not asking the right question – this behavior is not specific to tensors allocated in C++.