Free all GPU memory used in between runs

JSLJ23 · December 13, 2022, 3:56am

Hi pytorch community,

I was hoping to get some help on ways to completely free GPU memory after a single iteration of model training.

This process is part of a Bayesian optimisation loop involving a molecular docking program that runs on the GPU as well so I cannot terminate the code halfway to “free” the memory.
The cycle looks something like this:

Run docking
Train model to emulate docking
Run inference and choose the best data points
Repeat (10 times or so)

In between each step of docking and model training, pytorch seems to hold on to a block of memory as depicted in nvtop and nvidia-smi and despite me deleting the model, and optimizer by calling del on them, as well as running gc.collect() & torch.cuda.empty_cache(), this memory is still being taken up and my docking program runs into OOM errors. The model training and inference are nicely wrapped into a function and don’t exceed that function’s scope as the predictions are converted to numpy arrays and appended to a pandas DataFrame. I’ve tried my best to isolate the ML stuff from the docking stuff as much as possible.

I was hoping to get some help on this and if there are robust ways to free GPU memory fully, I would definitely be keen to include them.

Thank you!

ptrblck · December 13, 2022, 5:23am

Deleting all objects and references pointing to objects allocating GPU memory is the right approach and will free the memory. Calling empty_cache() will also clear the cache and free the memory (besides the memory used for the CUDA context).
Here is a small example:

import torch
import torch.nn as nn


def memory_stats():
    print(torch.cuda.memory_allocated()/1024**2)
    print(torch.cuda.memory_cached()/1024**2)


def allocate():
    x = torch.randn(1024*1024, device='cuda')
    memory_stats()
    
    
memory_stats()
# 0.0
# 0.0

allocate()
# 4.0 # allocated inside the function
# 20.0 # used cache

memory_stats()
# 0.0 # local tensor is free
# 20.0 # cache is still alive

torch.cuda.empty_cache()
memory_stats()
# 0.0
# 0.0 # cache is free again

x = torch.randn(1024, 1024, device='cuda')
memory_stats()
# 4.0
# 20.0

# store referece
y = x

del x # this does not free the memory of x since y still points to it
memory_stats() 
# 4.0  
# 20.0

del y # this allows PyTorch to free the memory and reuse it in the cache
memory_stats()
# 0.0
# 20.0

torch.cuda.empty_cache()
memory_stats() 
# 0.0
# 0.0

JSLJ23 · December 13, 2022, 5:29am

Thanks for the quick reply.

Is there a systematic way to perform these delete operations for the actual model, optimizer and data moved onto the GPU from enumerating the data_loader during after training & inference?

For a simple single Tensor example I tired, it works fine and makes sense, but I can’t seem to figure out exactly where to place them into the training loop to free the memory.

ptrblck · December 13, 2022, 5:37am

You can del any object you’ve created, but would need to make sure no references are stored to it as my example code shows.
You can replace the actual x tensor in my code with a real model etc. and would see the same effect.
Users often claim to see “memory leaks” by e.g. storing references to computation graphs and other tensors via logging information e.g. for the loss:

output = model(input)
loss = criterion(output, target)
losses += loss # !!! This will store the entire computation graph !!!

so you might want to check if you are running into similar issues.