CUDA memory is not freed properly

I think I’m missing something in my understanding of the CUDA memory management. I was under the impression that if one deletes all references to objects that were stored on a GPU and subsequently frees the cache, the allocated memory should be zero.

Since my code is part of a larger project and I was until now unable to reproduce the behaviour with a minimal example, I’ll show you a simplified version of what my code is doing. Consider the following snippet:

import torch
from torchvision.models import vgg19

device = torch.device("cuda:0")

def display_memory():
    torch.cuda.empty_cache()
    memory = torch.cuda.memory_allocated(device)
    print("{:.3f} GB".format(memory / 1024 ** 3))

def fn():
    input = torch.rand(16, 3, 512, 512, device=device)
    model = vgg19().features.to(device)
    return model(input).detach()

display_memory()

output = fn()
print(output.size())
display_memory()

del output
display_memory()

This prints the following:

0.000 GB
0.009 GB
0.001 GB

As expected the detached output adds almost no extra memory. After the del statement the memory is almost completely freed. Since this could be related to my later question: Can someone explain the remaining 1 MB after all Tensors and nn.Modules are deleted?

For my actual code the following is displayed if I insert the display_memory() function at the appropriate places:

0.000 GB
0.956 GB
0.946 GB

Even after deletion of every Tensor a large part of the memory remains occupied.

Can someone think of something what I could be doing wrong?

Usually the CUDA context uses some memory on your card which depends on the driver, GPU, CUDA version, etc. as far as I know.

How much is

some memory

? That would explain the remaining 1 MB of the example, but I think almost 1 GB is too much, or am I wrong?

If that helps:

driver: 418.67
GPU:    GeForce GTX 1080
CUDA:   10.1

In my case, it seems to use approx 950MB for:

driver: 418.56
GPU:    Titan V
CUDA:   10.1

In that case I have some follow up questions:

  1. Do you have a resource where I can read about that?
  2. Can this somehow be freed? In my case I need to call the fn() function multiple times and after a few iterations I don’t have enough memory left to execute it.
  1. Based on this topic it seems the best way would be to directly measure the memory usage on your platform.

  2. No, as the CUDA context holds the management data which is necessary to run CUDA applications.

Are you seeing a growing memory usage, i.e. could you (accidentally) store the computation graph while this might not be necessary?

I think you are right about the stored context information being the source of my problem. I’ve temporarily reduced the memory requirement of my fn() and tested it within a loop. After a few iterations the remaining memory stops to grow and stays at ~900 MB. I don’t think I accidentally store the computation graph, since

  1. I’ve detached the result of fn(), and
  2. I’ve deleted every Tensor out of desperation without effect.

Do you have any idea why the size of CUDA context grows if I execute fn() multiple times? I was planning on doing something like this:

def main():
    fn(parameter_set1)
    fn(parameter_set2)
    ...

If fn() needs more memory than what is available at maximum minus the CUDA context, the second call fails. Unfortunately this is the case for me.