Memory Leak after moving tensor to CPU

Hi everyone, I have the following 2 cases in the iteration loop of one 1 epoch, Simplified code is below

Case 1

for index, data in enumerate(train_loader):
   input = data['input']
   output = model(input)

Case 2

num_train_examples = 50
output_size = 20
outputs = np.zeros(num_train_examples, output_size)
for index, data in enumerate(train_loader):
   input = data['input']
   # Provides the index of training points in this batch
   indices = data['index'] 
   output = model(input).detach().cpu().numpy()
   outputs[indices] = np.array(output)
   del output

In case 1 I observe a constant memory being used by the code, but in case 2 I see the memory usage increases with the number of iterations of the loop.

I monitor memory usage with a snippet like this in the iteration loop

mem = psutil.virtual_memory()
print(f' {mem.percent:5} - {**3:10.2f} - {mem.available/1024**3:10.2f} - {mem.used/1024**3:10.2f}')

My question is why is output not garbage collected as there are no references to output anymore? I suspect that the assignment outputs[indices] = np.array(output) might be the cause, would appreciate if someone could explain what the right way would be to handle it to ensure constant memory usage across the iteration loop