Understanding of GPU memory accumulation

I am facing an issue with GPU memory accumulating in each for loop but I am not very sure why. I have replicated the problem in this very simple example, with a few different variations.

import torch

a = torch.zeros(10000, 3).cuda().requires_grad_() + 500

def clip():

    c = a * 5.0

    
    c[:, 2] = torch.clip(c[:, 2], 0, 2)               # 1

    # d = torch.clip(c[:, 2], 0, 2)                     # 2

    # e = torch.clip(c[:, 2], 0, 2).detach()            # 3

    # c[:, 2] = torch.clip(c[:, 2], 0, 2).detach()      # 4

    # c[:, 2].clip_(0, 2)                               # 5



for i in range(1000):

    clip()

    torch.cuda.empty_cache()

    torch.cuda.reset_peak_memory_stats()
    # loss_photo computation here
    print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e6}MB")

Using option 1, the gpu memory accumulates across the for loop.

All the other options does not lead to gpu memory accumulations. I do not understand why this is happening? Since I have no reference to variable c after each function call, shouldnt it go out-of-scope and free the memory from the tensor that is created from torch.clip?

My hunch is that it has something to do with the computational graph, because #4 does not lead to memory accumulation, though I am not very sure of the actual explanation.

Will love to hear some insights or explanation as to what is going on here! Thank you so much!

Since I have no reference to variable c after each function call, shouldnt it go out-of-scope and free the memory from the tensor that is created from torch.clip?

No, since the first option assigns the output to c in an inplace operation and this accumulates the computation graph.