How pytorch releases variable garbage?

chenchr · September 11, 2017, 3:06pm

Hello. I need to unfold some feature map of may network during training, which is cuda memory consuming. I found that the program dumps because of “out of cuda memory” after a few training loop, however during training loop, the variable I allocate should be local in the '‘for’ statement, I don’t know why it consumes out of memory after a few success loop.I think the memory consuming should be fixed during every loop. Can anyone help me out? Thanks!

QuantScientist · September 11, 2017, 3:38pm

Two methods which I frequently use for debugging:

By @smth 
def memReport():
    for obj in gc.get_objects():
        if torch.is_tensor(obj):
            print(type(obj), obj.size())
    
def cpuStats():
        print(sys.version)
        print(psutil.cpu_percent())
        print(psutil.virtual_memory())  # physical memory usage
        pid = os.getpid()
        py = psutil.Process(pid)
        memoryUse = py.memory_info()[0] / 2. ** 30  # memory use in GB...I think
        print('memory GB:', memoryUse)

cpuStats()
memReport()

Edited by @smth for PyTorch 0.4 and above, which doesn’t need the .data check.

chenchr · September 13, 2017, 4:34pm

Thanks! Does python gc collect garbage as soon as variable has no reference? Or with delay?

smth · September 13, 2017, 6:49pm

@chenchr it does immediately, unless you have reference cycles.

chenchr · September 14, 2017, 7:38am

@smth
Thanks! Do you means that:

def func():
  a = Variable(torch.randn(2,2))
  a = Variable(torch.randn(100,100))
  return

the memory allocated in a = Variable(torch.randn(2,2)) will be freed as soon as the code a = Variable(torch.randn(100,100)) is executed?

smth · September 14, 2017, 9:16pm

yes. correct…

fmassa · September 17, 2017, 12:33pm

But, don’t forget that once you call a = Variable(torch.rand(2, 2)), a holds the data.
When you call a = Variable(torch.rand(100, 100)) afterwards, first Variable(torch.rand(100, 100)) is allocated (so the first tensor is still in memory), then it is assigned to a, and then Variable(torch.rand(2, 2)) is freed.

chenchr · September 18, 2017, 9:04am

@fmassa
that means there have to be enough memory for two variable during the creation of the second variable？

fmassa · September 18, 2017, 11:25am

That means that if you have something like

a = torch.rand(1024, 1024, 1024)  # 4GB
# the following line allocates 4GB extra before the assignment,
# so you need to have 8GB in order for it to work
a = torch.rand(1024, 1024, 1024)
# now you only use 4GB