Who get memory rights on tensors created in Cuda code?

Hi,

I was wondering who have “memory rights” on tensors created in a pytorch Cuda extension.

I have created several custom cuda extension following this tutorial: https://pytorch.org/tutorials/advanced/cpp_extension.html.

I am now using those extensions in a training loop (i.e. a network is using some of my custom operations) and I noticed that the memory usage strictly increase at each training iteration. I think it might come from those custom operations so I would like to know:

  1. If a tensor is created in Cuda code binded to python and then returned by the custom function, if then the python script delete this tensor, does it effectively clear the memory?

  2. What about if a tensor is created in Cuda code but just contains intermediate results and so is not return by the Cuda function, is it deleted once we exit the function scope like in Python or do we have to delete it manually? If so and if the answer to the first question is yes, is the recommended practice to return all created tensors to then delete it in python or to handle those in the Cuda code itself?

I hope my question is clear, thank you for your help in advance.

Samuel

if then the python script delete this tensor, does it effectively clear the memory?

Yes

is it deleted once we exit the function scope like in Python or do we have to delete it manually?

It is deleted when you exit the scope, unless you return it.

How do you monitor the cuda memory? Note that we have a caching allocator that does not return the memory to the driver. So nvidia-smi memory report will always grow.
You can use torch.cuda.memory_allocated() to see the memory actually used to store Tensors.

1 Like

Thank you for your answer. I used torch.cuda.empty_cache() followed by torch.cuda.memory_summary() to monitor the memory. I noticed an increasing gap between the total allocated memory and the total freed memory. This gap gets larger with the number of iteration. This is why I assume an issue like some memory created at every iteration not being freed.

But do you actually run out of memory?
It is expected that pytorch will allocate more and more memory for the first iterations then it will stabilize after a while.

If you do run out of memory, you should try and pinpoint where it happens. Try replacing your cpp code by a python version the just returns an output of the right size. And see if it still happens.

Yes I do run out of memory very fast actually as the difference between allocated memory and freed memory is approximately 2GB at every iteration. I am almost certain it is not a normal behavior and there is something I do wrong.

I will try your suggestion and see how it goes. Thank you for your help.

Hi,

So for those of you who have similar issue in the future. I solved my problem by seeing this post:


And so indeed I manually deleted context attributes corresponding to tensors I obtained with my custom Cuda operations and it solved the memory leak.

You should use save_for_backward() instead of deleting them :smiley:
Nice find on the root cause.

1 Like