I think I’m missing something in my understanding of the CUDA memory management. I was under the impression that if one deletes all references to objects that were stored on a GPU and subsequently frees the cache, the allocated memory should be zero.
Since my code is part of a larger project and I was until now unable to reproduce the behaviour with a minimal example, I’ll show you a simplified version of what my code is doing. Consider the following snippet:
As expected the detached output adds almost no extra memory. After the del statement the memory is almost completely freed. Since this could be related to my later question: Can someone explain the remaining 1 MB after all Tensors and nn.Modules are deleted?
For my actual code the following is displayed if I insert the display_memory() function at the appropriate places:
0.000 GB
0.956 GB
0.946 GB
Even after deletion of every Tensor a large part of the memory remains occupied.
Can someone think of something what I could be doing wrong?
Do you have a resource where I can read about that?
Can this somehow be freed? In my case I need to call the fn() function multiple times and after a few iterations I don’t have enough memory left to execute it.
I think you are right about the stored context information being the source of my problem. I’ve temporarily reduced the memory requirement of my fn() and tested it within a loop. After a few iterations the remaining memory stops to grow and stays at ~900 MB. I don’t think I accidentally store the computation graph, since
I’ve detached the result of fn(), and
I’ve deleted every Tensor out of desperation without effect.
Do you have any idea why the size of CUDA context grows if I execute fn() multiple times? I was planning on doing something like this: