Help understanding how to release GPU memory / avoid leaks


I’ve looked around online but I still haven’t been able to figure out how to properly free GPU memory, here’s a link to a simple Colab demo explaining the situation [make sure to change the Runtime type to use a GPU]:

I basically start by allocating a random tensor, move it to the GPU, report the GPU memory usage, then move the tensor back to the CPU, report the GPU memory usage, then delete the reference to the tensor and report the GPU memory usage once more. On the last two memory reports I’d expect the usage to go back to zero, but I keep getting a non zero answer (e.g. the usage goes down from 37% to 4%, but not zero).

How can I get the GPU memory back to zero in this simple case? Is this an issue with Google Colab specifically?

Thanks in advance!

This likely isn’t a leak.
PyTorch initializes CUDA “on demand” when you first use it and as part of this initialization, some global GPU memory is allocated. You would not expect this to be freed before terminating the Python process.

1 Like

I wonder what global memory it is and its functions.

These maintain state of the device and also work areas for various libraries I think.

You can poke around in the relevant PyTorch source directories and read up on context in the CUDA docs and the libraries like cuDNN, cuBLAS etc. (Seems that NVidia+cloudfare helpfully decided I should not link to the NVidia docs more, so you’ll have to find those yourself. Typically the workflow is that you get a global handle).