Why Tensor object is not deleted immediately?

I have followed some codes about CUDA caching allocator and got a question.

When I execute the code below, the storage object created at i = 0 is not deleted immediately when first for loop iteration ends, but deleted when second one ends.
So, the input tensor at i = 1 can’t reuse the input tensor at i = 0.

What makes this happen?

import torch                                                                  
from torch import nn                                                          
device, dtype = torch.device('cuda'), torch.float32                           
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)).to(device, dtype)
for i in range(0, 20):                                                        
  print("iter %d start" % (i))                                                
  input = torch.randn(16, 16, 128, 128).to(device)                            
  output = m(input)                                                           
  print("iter %d complete" % (i))                                             


python makes this happen.
Because when you do out = f(inp), it first gets the inp, then evaluates f(inp), then delete the content of out then associate the result to out.
As you can see, the deletion here happens after the new value has been computed.

A simple way around this is to move your loop content inside a function so that the whole body of the function goes out of scope when you exit the function and before you exit the current iteration of the loop.

Oh Thanks! That makes sense.
But according to my log, input tensor at i = 0 is deallocated after the convolution at i =1.
As you said, input tensor at i = 0 should be deallocated before the convolution.

What could have happened?

For the input in your code sample, on top of being references by the python variable called “input”, the Tensor is already referenced in the computational graph (because it is needed for the backward computation). And that computational graph is kept alive by output that references it.
So the Tensor cannot be freed until output goes out of scope.

Note that if you call backward() on that graph (with retain_graph=False, the default), then it will free up the graph and that reference to input will go away. And so input should get deallocated, on the next iteration, as soon as the “input” python variable is overwritten.

Now everything’s clear to me.

Thanks for explanation!

1 Like