Clearing the GPU is a headache

Hi all, before adding my model to the gpu I added the following code:

def empty_cached():
  gc.collect()
  torch.cuda.empty_cache()

The idea buying that it will clear out to GPU of the previous model I was playing with.

Here’s a scenario, I start training with a resnet18 and after a few epochs I notice the results are not that good so I interrupt training, change the model, run the function above.
When I do it that way I get a:

RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 11.17 GiB total capacity; 10.56 GiB already allocated; 9.81 MiB free; 10.85 GiB reserved in total by PyTorch)

However, if I interupt training, restart the kernel and run the same model that wouldn’t work before, it now works.

It seems like nothing works as good as restarting the kernel. What would come the closest to it?

1 Like

You would have to delete all references to the tensors you would like to free.
gc.collect() and empty_cache() will not free tensors, which are still used and referenced.

1 Like

I see!

How can I delete references to the tensors?

I tried with del(model, optimizer, criterion) but when I query the GPU memory before and after it seems like its the same.

del tensor should work. To release the cached memory, you would need to call torch.cuda.empty_cache() afterwards.

Here is a small example:

print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

x = torch.randn(1024*1024).cuda()

# 4MB allocation and potentially larger cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

y = torch.randn(8*1024*1024).cuda()

# 4+32=36MB allocation and potentially larger cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

del x

# 32MB allocation, cache should stay the same
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

torch.cuda.empty_cache()

# 32MB allocation and cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

del y

# 0MB allocation, 32MB cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

torch.cuda.empty_cache()

# 0MB allocation and cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)
6 Likes

Amazing, it works great!

I was using the nvidia-smi to determine the memory allocation on the GPU. What would be the difference between that and torch.cuda.memory_allocated.

nvidia-smi shows the all processes which use memory on the device.
So besides PyTorch or course other processes might use memory.
Also, note that PyTorch loads the CUDA kernels, cudnn, CUDA runtime etc. after the first CUDA operation, which will also allocate memory (and cannot be freed until the script exits).
Depending on the device, CUDA version etc. this CUDA context might take ~700MB.

@ptrblck sir, I have the same problem here, can you help me determine where I put the cuda. empty_cache()?

is there a way to programmatically clear CUDA context?

I would like to release the memory allocated by CUDA context after model training because my process need to do other operations after that

No, you cannot delete the CUDA context while the PyTorch process is still running and would have to shutdown the current process and use a new one for the downstream application.

Is there a conveninent method for deleting all the model tensors in one go?