Hi everyone, I have the following 2 cases in the iteration loop of one 1 epoch, Simplified code is below
Case 1
for index, data in enumerate(train_loader):
input = data['input']
output = model(input)
Case 2
num_train_examples = 50
output_size = 20
outputs = np.zeros(num_train_examples, output_size)
for index, data in enumerate(train_loader):
input = data['input']
# Provides the index of training points in this batch
indices = data['index']
output = model(input).detach().cpu().numpy()
outputs[indices] = np.array(output)
del output
In case 1 I observe a constant memory being used by the code, but in case 2 I see the memory usage increases with the number of iterations of the loop.
I monitor memory usage with a snippet like this in the iteration loop
mem = psutil.virtual_memory()
print(f' {mem.percent:5} - {mem.free/1024**3:10.2f} - {mem.available/1024**3:10.2f} - {mem.used/1024**3:10.2f}')
My question is why is output
not garbage collected as there are no references to output anymore? I suspect that the assignment outputs[indices] = np.array(output)
might be the cause, would appreciate if someone could explain what the right way would be to handle it to ensure constant memory usage across the iteration loop