Consider the following code snippet.
import torch
for i in range(10):
x = torch.randn(10000, 10000).cuda()
print(torch.cuda.max_memory_allocated())
On my computer, the output is
400556032
801112064
801112064
801112064
801112064
801112064
801112064
801112064
801112064
801112064
It seems that PyTorch is automatically deleting tensors from GPU that will no longer be used. How does it know? Isn’t the name of the tensor x
only a reference to the tensor?
Consider the following code snippet
import torch
x = torch.randn(100).cuda()
print(x.mean().item())
y = x # Shallow copy
del x
torch.cuda.empty_cache()
print(y.mean().item())
On my computer, the two output values are the same. This suggests that PyTorch does not “delete” tensors, even though explicitly ordered.
My question is, how do memory allocation and deallocation work in PyTorch?