Hello, everyone. I use torch.cuda.max_memory_allocated() to track graphics memory usage of my program. I carefully delete tensors immediately after they are used but the graphic card still runs out of memory after several iterations.

Then I use the following snippet to track residing tensors.

```
# prints currently alive Tensors and Variables
import torch
import gc
for obj in gc.get_objects():
try:
if torch.is_tensor(obj) or (hasattr(obj, 'data') and torch.is_tensor(obj.data)):
print(type(obj), obj.size())
except:
pass
```

After one SGD iteration, my program outputs

```
<class 'torch.Tensor'> torch.Size([100, 10])
<class 'torch.Tensor'> torch.Size([])
<class 'torch.Tensor'> torch.Size([])
<class 'torch.Tensor'> torch.Size([])
<class 'torch.Tensor'> torch.Size([128, 784])
<class 'torch.Tensor'> torch.Size([128])
<class 'torch.Tensor'> torch.Size([64, 128])
<class 'torch.Tensor'> torch.Size([64])
<class 'torch.Tensor'> torch.Size([10, 64])
<class 'torch.Tensor'> torch.Size([10])
<class 'torch.Tensor'> torch.Size([100, 109386])
<class 'torch.Tensor'> torch.Size([300, 1, 28, 28])
<class 'torch.Tensor'> torch.Size([300])
<class 'torch.Tensor'> torch.Size([300, 109386])
<class 'torch.Tensor'> torch.Size([60000, 28, 28])
<class 'torch.Tensor'> torch.Size([60000])
<class 'torch.Tensor'> torch.Size([10000, 28, 28])
<class 'torch.Tensor'> torch.Size([10000])
<class 'torch.nn.parameter.Parameter'> torch.Size([10])
<class 'torch.nn.parameter.Parameter'> torch.Size([128, 784])
<class 'torch.nn.parameter.Parameter'> torch.Size([128])
<class 'torch.nn.parameter.Parameter'> torch.Size([64, 128])
<class 'torch.nn.parameter.Parameter'> torch.Size([64])
<class 'torch.nn.parameter.Parameter'> torch.Size([10, 64])
<class 'torch.nn.parameter.Parameter'> torch.Size([10])
<class 'torch.Tensor'> torch.Size([50, 109386])
maximum graphics memory 2118.43798828125
```

The next SGD iteration shows exactly the same tensors:

```
<class 'torch.Tensor'> torch.Size([100, 10])
<class 'torch.Tensor'> torch.Size([])
<class 'torch.Tensor'> torch.Size([])
<class 'torch.Tensor'> torch.Size([])
<class 'torch.Tensor'> torch.Size([128, 784])
<class 'torch.Tensor'> torch.Size([128])
<class 'torch.Tensor'> torch.Size([64, 128])
<class 'torch.Tensor'> torch.Size([64])
<class 'torch.Tensor'> torch.Size([10, 64])
<class 'torch.Tensor'> torch.Size([10])
<class 'torch.Tensor'> torch.Size([100, 109386])
<class 'torch.Tensor'> torch.Size([300, 1, 28, 28])
<class 'torch.Tensor'> torch.Size([300])
<class 'torch.Tensor'> torch.Size([300, 109386])
<class 'torch.Tensor'> torch.Size([60000, 28, 28])
<class 'torch.Tensor'> torch.Size([60000])
<class 'torch.Tensor'> torch.Size([10000, 28, 28])
<class 'torch.Tensor'> torch.Size([10000])
<class 'torch.nn.parameter.Parameter'> torch.Size([10])
<class 'torch.nn.parameter.Parameter'> torch.Size([128, 784])
<class 'torch.nn.parameter.Parameter'> torch.Size([128])
<class 'torch.nn.parameter.Parameter'> torch.Size([64, 128])
<class 'torch.nn.parameter.Parameter'> torch.Size([64])
<class 'torch.nn.parameter.Parameter'> torch.Size([10, 64])
<class 'torch.nn.parameter.Parameter'> torch.Size([10])
<class 'torch.Tensor'> torch.Size([50, 109386])
maximum graphics memory 2245.8056640625
```

But the memory goes up to 2245.8 (MB) from 2118.4 (MB). Why does this happen? Does python’s gc miss some tensors? Actually my program uses the operation below quite often, i.e., converting tensors to numpy arrays.

```
a=torch.norm(tensor_T, dim=1).detach().cpu().numpy()
```

Will this operation allow intermediate tensors to evade python’s garbage collection mechanism?

My program also has a lot of OrderedDicts, with keys being str’s and vals being tensors.

I tested my program on Titan X Pascal (with cuda 10.1, python 3.6.9 and pytorch 1.2.0) and Geforce RTX 2080 ti (with cuda 10.0, python 3.6.9 and pytorch 1.2.0).

Thank you for your attention!