How to recycle gradient tensors allocated in C++ during backprop

Thank you so much for the reply!

If I call torch.cuda.memory_allocated() after X.backward(dLdX), it does release dLdX's GPU memory. But I am wondering if there is a way to do that in the middle of the backward pass. The analogy would be as if X is an intermediate variable whose gradient gets released as soon as it is no longer needed for the rest of the backprop. Am I making sense?

I made another post after I realized I was not asking the right question – this behavior is not specific to tensors allocated in C++.