I have followed some codes about CUDA caching allocator and got a question.
When I execute the code below, the storage object created at i = 0 is not deleted immediately when first for loop iteration ends, but deleted when second one ends.
So, the input tensor at i = 1 can’t reuse the input tensor at i = 0.
What makes this happen?
import torch
from torch import nn
torch.manual_seed(0)
device, dtype = torch.device('cuda'), torch.float32
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)).to(device, dtype)
for i in range(0, 20):
print("iter %d start" % (i))
input = torch.randn(16, 16, 128, 128).to(device)
output = m(input)
print("iter %d complete" % (i))