How to totally free allocate memory in CUDA?

Let me use a simple example to show the case

import torch 
a = torch.rand(10000, 10000).cuda()  # memory size: 865 MiB
del a
torch.cuda.empty_cache()   # still have 483 MiB

That seems very strange, even though I use “del Tensor” + torch.cuda.empty_cache(), there are still more than half memory left in CUDA side (483 MB in my case above). Is there any approach to totally remove these unused memory from GPU when the old reference is no longer needed?

If you are checking the memory via nvidia-smi, note that the CUDA context will also use memory.
The allocated and cached memory will be freed using your code snippet:

print(torch.cuda.memory_allocated())
print(torch.cuda.memory_cached())
> 0
> 0

a = torch.rand(10000, 10000).cuda()  
print(torch.cuda.memory_allocated())
print(torch.cuda.memory_cached())
> 400556032
> 400556032

del a
print(torch.cuda.memory_allocated())
print(torch.cuda.memory_cached())
> 0
> 400556032

torch.cuda.empty_cache()   
print(torch.cuda.memory_allocated())
print(torch.cuda.memory_cached())
> 0
> 0
1 Like