Hi, here is one toy code about this issue:
import torch
torch.cuda.set_device(3)
a = torch.rand(10000, 10000).cuda()
# monitor cuda:3 by "watch -n 0.01 nvidia-smi"
a = torch.add(a, 0.0)
# keep monitoring
When using same variable name “a” in torch.add, I find the old a’s memory is not freed in cuda, it still exists even though the reference is updated, and I cannot reach tp original memory since the reference is gone. How could I make the memory of old reference automatically freed when use same variable name in left/right of torch operations?
a
will be freed automatically, if no reference points to this variable.
Note that PyTorch uses a memory caching mechanism, so nvidia-smi
will show all allocated and cached memory as well as the memory used by the CUDA context.
Here is a small example to demonstrate this behavior:
# Should be empty
print('allocated ', torch.cuda.memory_allocated() / 1024**2)
print('cached ', torch.cuda.memory_cached() / 1024**2)
> allocated 0.0
> cached 0.0
# Initial setup
a = torch.rand(1024, 1024, 128).cuda()
print('allocated ', torch.cuda.memory_allocated() / 1024**2)
print('cached ', torch.cuda.memory_cached() / 1024**2)
> allocated 512.0
> cached 512.0
# torch.add will use a temp variable, as it's not inplace
a = torch.add(a, 0.0)
print('allocated ', torch.cuda.memory_allocated() / 1024**2)
print('cached ', torch.cuda.memory_cached() / 1024**2)
> allocated 512.0
> cached 1024.0
# Delete reference
a = 1.
print('allocated ', torch.cuda.memory_allocated() / 1024**2)
print('cached ', torch.cuda.memory_cached() / 1024**2)
> allocated 0.0
> cached 1024.0
1 Like