How to automatically free CUDA memory when using same reference (variable name) in torch operation?

Hi, here is one toy code about this issue:

import torch 

torch.cuda.set_device(3)
a = torch.rand(10000, 10000).cuda()
# monitor cuda:3 by "watch -n 0.01 nvidia-smi"
a = torch.add(a, 0.0)
# keep monitoring

When using same variable name “a” in torch.add, I find the old a’s memory is not freed in cuda, it still exists even though the reference is updated, and I cannot reach tp original memory since the reference is gone. How could I make the memory of old reference automatically freed when use same variable name in left/right of torch operations?

a will be freed automatically, if no reference points to this variable.
Note that PyTorch uses a memory caching mechanism, so nvidia-smi will show all allocated and cached memory as well as the memory used by the CUDA context.

Here is a small example to demonstrate this behavior:

# Should be empty
print('allocated ', torch.cuda.memory_allocated() / 1024**2)
print('cached ', torch.cuda.memory_cached() / 1024**2)
> allocated  0.0
> cached  0.0

# Initial setup
a = torch.rand(1024, 1024, 128).cuda()
print('allocated ', torch.cuda.memory_allocated() / 1024**2)
print('cached ', torch.cuda.memory_cached() / 1024**2)
> allocated  512.0
> cached  512.0

# torch.add will use a temp variable, as it's not inplace
a = torch.add(a, 0.0)
print('allocated ', torch.cuda.memory_allocated() / 1024**2)
print('cached ', torch.cuda.memory_cached() / 1024**2)
> allocated  512.0
> cached  1024.0

# Delete reference
a = 1.
print('allocated ', torch.cuda.memory_allocated() / 1024**2)
print('cached ', torch.cuda.memory_cached() / 1024**2)
> allocated  0.0
> cached  1024.0
1 Like