The idea buying that it will clear out to GPU of the previous model I was playing with.
Here’s a scenario, I start training with a resnet18 and after a few epochs I notice the results are not that good so I interrupt training, change the model, run the function above.
When I do it that way I get a:
RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 11.17 GiB total capacity; 10.56 GiB already allocated; 9.81 MiB free; 10.85 GiB reserved in total by PyTorch)
However, if I interupt training, restart the kernel and run the same model that wouldn’t work before, it now works.
It seems like nothing works as good as restarting the kernel. What would come the closest to it?
You would have to delete all references to the tensors you would like to free. gc.collect() and empty_cache() will not free tensors, which are still used and referenced.
del tensor should work. To release the cached memory, you would need to call torch.cuda.empty_cache() afterwards.
Here is a small example:
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)
x = torch.randn(1024*1024).cuda()
# 4MB allocation and potentially larger cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)
y = torch.randn(8*1024*1024).cuda()
# 4+32=36MB allocation and potentially larger cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)
del x
# 32MB allocation, cache should stay the same
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)
torch.cuda.empty_cache()
# 32MB allocation and cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)
del y
# 0MB allocation, 32MB cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)
torch.cuda.empty_cache()
# 0MB allocation and cache
print(torch.cuda.memory_allocated()/1024**2)
print(torch.cuda.memory_cached()/1024**2)
nvidia-smi shows the all processes which use memory on the device.
So besides PyTorch or course other processes might use memory.
Also, note that PyTorch loads the CUDA kernels, cudnn, CUDA runtime etc. after the first CUDA operation, which will also allocate memory (and cannot be freed until the script exits).
Depending on the device, CUDA version etc. this CUDA context might take ~700MB.
No, you cannot delete the CUDA context while the PyTorch process is still running and would have to shutdown the current process and use a new one for the downstream application.