Clearing the GPU is a headache

alx · June 9, 2020, 12:49pm

Hi all, before adding my model to the gpu I added the following code:

def empty_cached():
  gc.collect()
  torch.cuda.empty_cache()

The idea buying that it will clear out to GPU of the previous model I was playing with.

Here’s a scenario, I start training with a resnet18 and after a few epochs I notice the results are not that good so I interrupt training, change the model, run the function above.
When I do it that way I get a:

RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 11.17 GiB total capacity; 10.56 GiB already allocated; 9.81 MiB free; 10.85 GiB reserved in total by PyTorch)

However, if I interupt training, restart the kernel and run the same model that wouldn’t work before, it now works.

It seems like nothing works as good as restarting the kernel. What would come the closest to it?

ptrblck · June 10, 2020, 7:30am

You would have to delete all references to the tensors you would like to free.
gc.collect() and empty_cache() will not free tensors, which are still used and referenced.

alx · June 10, 2020, 7:35pm

I see!

How can I delete references to the tensors?

I tried with del(model, optimizer, criterion) but when I query the GPU memory before and after it seems like its the same.

ptrblck · June 11, 2020, 4:35am

del tensor should work. To release the cached memory, you would need to call torch.cuda.empty_cache() afterwards.

Here is a small example:

print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

x = torch.randn(1024*1024).cuda()

# 4MB allocation and potentially larger cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

y = torch.randn(8*1024*1024).cuda()

# 4+32=36MB allocation and potentially larger cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

del x

# 32MB allocation, cache should stay the same
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

torch.cuda.empty_cache()

# 32MB allocation and cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

del y

# 0MB allocation, 32MB cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

torch.cuda.empty_cache()

# 0MB allocation and cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)

alx · June 11, 2020, 4:35pm

Amazing, it works great!

I was using the nvidia-smi to determine the memory allocation on the GPU. What would be the difference between that and torch.cuda.memory_allocated.

ptrblck · June 12, 2020, 6:00am

nvidia-smi shows the all processes which use memory on the device.
So besides PyTorch or course other processes might use memory.
Also, note that PyTorch loads the CUDA kernels, cudnn, CUDA runtime etc. after the first CUDA operation, which will also allocate memory (and cannot be freed until the script exits).
Depending on the device, CUDA version etc. this CUDA context might take ~700MB.

Saifeddine_Barkia · August 5, 2020, 8:00am

@ptrblck sir, I have the same problem here, can you help me determine where I put the cuda. empty_cache()?

ljj7975 · July 8, 2021, 3:04pm

is there a way to programmatically clear CUDA context?

I would like to release the memory allocated by CUDA context after model training because my process need to do other operations after that

ptrblck · July 8, 2021, 6:45pm

No, you cannot delete the CUDA context while the PyTorch process is still running and would have to shutdown the current process and use a new one for the downstream application.

wiiiktor · August 29, 2021, 1:37pm

Is there a conveninent method for deleting all the model tensors in one go?