CUDA leakage when sum over the outputs?

Hi! I have read this post: CUDA memory leakage.

In the discussion, I find that this makes great sense to me:
Often memory leaks are created by trying to store some training information like the loss without detaching it from the computation graph, which will store the whole graph with it.

However, I don’t understand why the memory usage remains the same when I just add all the outputs into a variable (no detach operation). I think I am still storing the training information, right?

Thank you!

1 Like

If you append a tensor with an attached computation graph (valid .grad_fn), the computation graph will be stored with it and should could see an increased memory usage in each iteration.
Is this the case for you or are you seeing any other issue?

1 Like

Thank you for the reply! May post some code here, so that I can explain it more clearly.

net = net.cuda()
sum = 0

while True:
batch = 4
h = 3
w = 3
num_outputs = 5
x = torch.randn(batch,1, h, w).cuda()
y = net(x)
sum += y

So basically, I am just summing all the outputs, and the computation graph should be attched to the outputs, right? However, the GPU usage remains the same. I don’t understand why.

Thanks!

1 Like

Thank you for the reply! May post some code here, so that I can explain it more clearly.

while True:
x = torch.randn(batch,1, h, w).cuda()
y = net(x)
sum += y

So basically, I am just summing all the outputs, and the computation graph should be attched to the outputs, right? However, the GPU usage remains the same. I don’t understand why.

Thanks!

1 Like

Yes, this should be the case, if you didn’t wrap the code block in a with torch.no_grad() block.

Which model are you using? Also, how are you checking the memory usage? Could you print it using torch.cuda.memory_allocated() in the loop?

1 Like

Thank you for the instructions and sorry for the delayed reply.

I failed to reproduce the results I mentioned from the beginning. The usage of GPU is increasing now.

I am just using a really small toy model. And I keep typing command nvidia-smi to check the memory usage.

Thank you so much!

PyTorch uses a custom caching memory allocator, which will try to reuse the device memory.
Thus nvidia-smi shows you the overall memory usage including the CUDA context, the allocated and the cached memory (also from other processes), which might show the increased memory usage after a couple of steps for small model.

1 Like