I am now using those extensions in a training loop (i.e. a network is using some of my custom operations) and I noticed that the memory usage strictly increase at each training iteration. I think it might come from those custom operations so I would like to know:
If a tensor is created in Cuda code binded to python and then returned by the custom function, if then the python script delete this tensor, does it effectively clear the memory?
What about if a tensor is created in Cuda code but just contains intermediate results and so is not return by the Cuda function, is it deleted once we exit the function scope like in Python or do we have to delete it manually? If so and if the answer to the first question is yes, is the recommended practice to return all created tensors to then delete it in python or to handle those in the Cuda code itself?
I hope my question is clear, thank you for your help in advance.
if then the python script delete this tensor, does it effectively clear the memory?
is it deleted once we exit the function scope like in Python or do we have to delete it manually?
It is deleted when you exit the scope, unless you return it.
How do you monitor the cuda memory? Note that we have a caching allocator that does not return the memory to the driver. So nvidia-smi memory report will always grow.
You can use torch.cuda.memory_allocated() to see the memory actually used to store Tensors.
Thank you for your answer. I used torch.cuda.empty_cache() followed by torch.cuda.memory_summary() to monitor the memory. I noticed an increasing gap between the total allocated memory and the total freed memory. This gap gets larger with the number of iteration. This is why I assume an issue like some memory created at every iteration not being freed.
Yes I do run out of memory very fast actually as the difference between allocated memory and freed memory is approximately 2GB at every iteration. I am almost certain it is not a normal behavior and there is something I do wrong.
I will try your suggestion and see how it goes. Thank you for your help.