How is GPU memory allocated and released? Procedure to release GPU memory?

I would really want to understand this properly. There are many many threads about GPU memory problems, but it is hard to get a proper understanding of the details behind how Pytorch manages the memory one the GPU.

I have a problem where I run out of memory when doing crossvalidation, where for each fold, I load and train a new model and then evaluate the model before the same Python variable is used to hold a new model, train that etc.

Initially I thought that if a Python variable references a tensor on the GPU and that Python variable then gets re-assigned, that the memory on the GPU would get released because no variable is pointing to it any more. But this does not seem to be the case: the GPU memory needed by the model seems to accumulate over each fold even though the python variable that holds the model gets re-assigned.

However, if I move the model to the cpu and do torch.cuda.empty_cache() before reassigning the variable, the memory does not seem to accumulate or at least not as much.

Is this intended behaviour? Should GPU memory deallocation happen automatically or is a specific procedure needed? What is going on?

Hi,

The idea is that we have a caching allocator.
So when you need to put a Tensor on the GPU, we ask the caching allocator. If it already have enough space, it just returns it. Otherwise, it asks the GPU driver for new memory.
When the Tensor is destroyed, the memory is kept around by the allocator.
The memory is only released to the GPU driver if you are about to OOM or if the user cal empty_cache().

Does that answer your question?

So when a python variable that refers to a tensor on the GPU directly or indirectly (e.g. by referring to a module that in turn refers to one or more tensors on the GPU) gets reassigned, the GPU memory of all tensors that have been referenced directly or indirectly should be available to get reused?

My problem is that I expected this to happen but it does not appear to happen when I just reassign a variable that points to a torch module on the GPU. Instead I have to first move the module back to the cpu (and manually do `empty_cache() but not sure if this is really required).

the GPU memory of all tensors that have been referenced directly or indirectly should be available to get reused?

Yes exactly.

My problem is that I expected this to happen but it does not appear to happen

Could you do a small code sample (30-40 lines) that reproduces this please?