I have followed some codes about CUDA caching allocator and got a question.
When I execute the code below, the storage object created at i = 0 is not deleted immediately when first for loop iteration ends, but deleted when second one ends.
So, the input tensor at i = 1 can’t reuse the input tensor at i = 0.
What makes this happen?
from torch import nn
device, dtype = torch.device('cuda'), torch.float32
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)).to(device, dtype)
for i in range(0, 20):
print("iter %d start" % (i))
input = torch.randn(16, 16, 128, 128).to(device)
output = m(input)
print("iter %d complete" % (i))
python makes this happen.
Because when you do
out = f(inp), it first gets the
inp, then evaluates
f(inp), then delete the content of
out then associate the result to
As you can see, the deletion here happens after the new value has been computed.
A simple way around this is to move your loop content inside a function so that the whole body of the function goes out of scope when you exit the function and before you exit the current iteration of the loop.
Oh Thanks! That makes sense.
But according to my log, input tensor at i = 0 is deallocated after the convolution at i =1.
As you said, input tensor at i = 0 should be deallocated before the convolution.
What could have happened?
input in your code sample, on top of being references by the python variable called “input”, the Tensor is already referenced in the computational graph (because it is needed for the backward computation). And that computational graph is kept alive by
output that references it.
So the Tensor cannot be freed until
output goes out of scope.
Note that if you call backward() on that graph (with retain_graph=False, the default), then it will free up the graph and that reference to input will go away. And so
input should get deallocated, on the next iteration, as soon as the “input” python variable is overwritten.
Now everything’s clear to me.
Thanks for explanation!