Here’s a simple repro of my situation:
import torch
import torch.nn as nn
feature_size = 1000
window = torch.zeros((5, feature_size, 1), requires_grad=True).cuda()
input_encoder = nn.Linear(3 * 224 * 224, feature_size).cuda()
step_count = 0
while True:
# Create new element
new_element = torch.zeros((1, 3, 224, 224), requires_grad=True).cuda()
new_element = input_encoder(new_element.reshape(1, -1))
# Remove last element and add the new one
window = window[:-1]
window = torch.cat((new_element.unsqueeze(-1), window), dim=0)
step_count += 1
print("Step: {}".format(step_count))
On my Titan Xp (12GB memory) this runs out of memory after just over 9300 steps. I tested initially using PyTorch 1.0.0 with cuda 8, but I repro’d it also with PyTorch 1.1.0 and cuda 10. In both cases using Python 3.5 on Windows 10.
Now ideally, what’s happening is that a new element gets added to window, and one gets removed, and all of the resources associated with the old one get cleared. I’m not sure how best to achieve freeing the old one though. Neither gc.collect() nor torch.cuda.empty_cache() have fixed the problem.
If I define memReport:
def memReport():
for obj in gc.get_objects():
if torch.is_tensor(obj):
print(type(obj), obj.size())
and run it in the loop, the result is always:
<class 'torch.Tensor'> torch.Size([1, 1000])
<class 'torch.Tensor'> torch.Size([5, 1000, 1])
<class 'torch.nn.parameter.Parameter'> torch.Size([1000, 150528])
<class 'torch.nn.parameter.Parameter'> torch.Size([1000])
Anyone have any advice or pointers on freeing resources and preventing this memory leak?
Thanks!