I am facing an issue with GPU memory accumulating in each for loop but I am not very sure why. I have replicated the problem in this very simple example, with a few different variations.
import torch
a = torch.zeros(10000, 3).cuda().requires_grad_() + 500
def clip():
c = a * 5.0
c[:, 2] = torch.clip(c[:, 2], 0, 2) # 1
# d = torch.clip(c[:, 2], 0, 2) # 2
# e = torch.clip(c[:, 2], 0, 2).detach() # 3
# c[:, 2] = torch.clip(c[:, 2], 0, 2).detach() # 4
# c[:, 2].clip_(0, 2) # 5
for i in range(1000):
clip()
torch.cuda.empty_cache()
torch.cuda.reset_peak_memory_stats()
# loss_photo computation here
print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e6}MB")
Using option 1, the gpu memory accumulates across the for loop.
All the other options does not lead to gpu memory accumulations. I do not understand why this is happening? Since I have no reference to variable c after each function call, shouldnt it go out-of-scope and free the memory from the tensor that is created from torch.clip?
My hunch is that it has something to do with the computational graph, because #4 does not lead to memory accumulation, though I am not very sure of the actual explanation.
Will love to hear some insights or explanation as to what is going on here! Thank you so much!