How to recycle gradient tensors allocated in C++ during backprop

Hi folks,

I wrote a C++/CUDA extension to produce a gradient tensor, e.g. dLdX, where L is the loss and X is a tensor. When I call X.backward(dLdX), the memory dLdX occupies (allocated in C++) will not get recycled. My question is: how can I tell the computational graph to recycle dLdX once it is no longer needed during the execution of X.backward(dLdX).

Thanks!

How did you check, that the memory is not recycled? Was torch.cuda.memory_allocated() after the operation higher and stayed this way?
If so, are you storing any references or in your extension?

Thank you so much for the reply!

If I call torch.cuda.memory_allocated() after X.backward(dLdX), it does release dLdX's GPU memory. But I am wondering if there is a way to do that in the middle of the backward pass. The analogy would be as if X is an intermediate variable whose gradient gets released as soon as it is no longer needed for the rest of the backprop. Am I making sense?

I made another post after I realized I was not asking the right question – this behavior is not specific to tensors allocated in C++.