The gc mechanism doesn't show new tensors but graphics memory usage increases

Hello, everyone. I use torch.cuda.max_memory_allocated() to track graphics memory usage of my program. I carefully delete tensors immediately after they are used but the graphic card still runs out of memory after several iterations.

Then I use the following snippet to track residing tensors.

# prints currently alive Tensors and Variables
import torch
import gc
for obj in gc.get_objects():
    try:
        if torch.is_tensor(obj) or (hasattr(obj, 'data') and torch.is_tensor(obj.data)):
            print(type(obj), obj.size())
    except:
        pass

After one SGD iteration, my program outputs

<class 'torch.Tensor'> torch.Size([100, 10])
<class 'torch.Tensor'> torch.Size([])
<class 'torch.Tensor'> torch.Size([])
<class 'torch.Tensor'> torch.Size([])
<class 'torch.Tensor'> torch.Size([128, 784])
<class 'torch.Tensor'> torch.Size([128])
<class 'torch.Tensor'> torch.Size([64, 128])
<class 'torch.Tensor'> torch.Size([64])
<class 'torch.Tensor'> torch.Size([10, 64])
<class 'torch.Tensor'> torch.Size([10])
<class 'torch.Tensor'> torch.Size([100, 109386])
<class 'torch.Tensor'> torch.Size([300, 1, 28, 28])
<class 'torch.Tensor'> torch.Size([300])
<class 'torch.Tensor'> torch.Size([300, 109386])
<class 'torch.Tensor'> torch.Size([60000, 28, 28])
<class 'torch.Tensor'> torch.Size([60000])
<class 'torch.Tensor'> torch.Size([10000, 28, 28])
<class 'torch.Tensor'> torch.Size([10000])
<class 'torch.nn.parameter.Parameter'> torch.Size([10])
<class 'torch.nn.parameter.Parameter'> torch.Size([128, 784])
<class 'torch.nn.parameter.Parameter'> torch.Size([128])
<class 'torch.nn.parameter.Parameter'> torch.Size([64, 128])
<class 'torch.nn.parameter.Parameter'> torch.Size([64])
<class 'torch.nn.parameter.Parameter'> torch.Size([10, 64])
<class 'torch.nn.parameter.Parameter'> torch.Size([10])
<class 'torch.Tensor'> torch.Size([50, 109386])
maximum graphics memory 2118.43798828125

The next SGD iteration shows exactly the same tensors:

<class 'torch.Tensor'> torch.Size([100, 10])
<class 'torch.Tensor'> torch.Size([])
<class 'torch.Tensor'> torch.Size([])
<class 'torch.Tensor'> torch.Size([])
<class 'torch.Tensor'> torch.Size([128, 784])
<class 'torch.Tensor'> torch.Size([128])
<class 'torch.Tensor'> torch.Size([64, 128])
<class 'torch.Tensor'> torch.Size([64])
<class 'torch.Tensor'> torch.Size([10, 64])
<class 'torch.Tensor'> torch.Size([10])
<class 'torch.Tensor'> torch.Size([100, 109386])
<class 'torch.Tensor'> torch.Size([300, 1, 28, 28])
<class 'torch.Tensor'> torch.Size([300])
<class 'torch.Tensor'> torch.Size([300, 109386])
<class 'torch.Tensor'> torch.Size([60000, 28, 28])
<class 'torch.Tensor'> torch.Size([60000])
<class 'torch.Tensor'> torch.Size([10000, 28, 28])
<class 'torch.Tensor'> torch.Size([10000])
<class 'torch.nn.parameter.Parameter'> torch.Size([10])
<class 'torch.nn.parameter.Parameter'> torch.Size([128, 784])
<class 'torch.nn.parameter.Parameter'> torch.Size([128])
<class 'torch.nn.parameter.Parameter'> torch.Size([64, 128])
<class 'torch.nn.parameter.Parameter'> torch.Size([64])
<class 'torch.nn.parameter.Parameter'> torch.Size([10, 64])
<class 'torch.nn.parameter.Parameter'> torch.Size([10])
<class 'torch.Tensor'> torch.Size([50, 109386])
maximum graphics memory 2245.8056640625

But the memory goes up to 2245.8 (MB) from 2118.4 (MB). Why does this happen? Does python’s gc miss some tensors? Actually my program uses the operation below quite often, i.e., converting tensors to numpy arrays.

a=torch.norm(tensor_T, dim=1).detach().cpu().numpy()

Will this operation allow intermediate tensors to evade python’s garbage collection mechanism?

My program also has a lot of OrderedDicts, with keys being str’s and vals being tensors.

I tested my program on Titan X Pascal (with cuda 10.1, python 3.6.9 and pytorch 1.2.0) and Geforce RTX 2080 ti (with cuda 10.0, python 3.6.9 and pytorch 1.2.0).

Thank you for your attention!

But the memory goes up to 2245.8 (MB) from 2118.4 (MB).

This is a small increase. This might simply be because you have some slightly different behavior between the first iteration and the others (because the first iteration has only the results from that iteration, while during the second, some buffers from the 1st one might still be around).
Does the memory continue to grow for the other iterations?

Also you’re looking at the maximum allocated memory, not the current one.

Yes, it grows after every SGD iteration and my graphics memory quickly runs out. I have converted my tracking codes to

def graphics_memory():
    xa = torch.cuda.memory_allocated()
    xa = xa / 1024 / 1024
    xb = torch.cuda.memory_cached()
    xb = xb / 1024 / 1024
    return max(xa, xb)

def memory_track():
    for obj in gc.get_objects():
        try:
            if torch.is_tensor(obj) or (hasattr(obj, "data") and torch.is_tensor(obj.data)):
                print(type(obj), obj.size())
            elif isinstance(obj, collections.OrderedDict):
                for key, val in obj.items():
                    if torch.is_tensor(val):
                        print("key {}, val {}".format(key, val.size()))
        except:
            pass
    print("current memory allocated {}".format(graphics_memory()))
    return

It still shows the current memory usage keeps growing despite that the residing tensors or OrderedDicts remain the same.

Could you give a small code sample (40-50 lines) that reproduces this behavior so that I can investigate locally please?

After an arduous day of debugging, I finally found the bug. Thanks a lot! Have a nice day!

Out of curiosity, what was it?