CUDA leakage when sum over the outputs?

Huimin_ZENG · January 11, 2020, 10:38pm

Hi! I have read this post: CUDA memory leakage.

In the discussion, I find that this makes great sense to me:
Often memory leaks are created by trying to store some training information like the loss without detaching it from the computation graph, which will store the whole graph with it.

However, I don’t understand why the memory usage remains the same when I just add all the outputs into a variable (no detach operation). I think I am still storing the training information, right?

Thank you!

ptrblck · January 12, 2020, 3:09am

If you append a tensor with an attached computation graph (valid .grad_fn), the computation graph will be stored with it and should could see an increased memory usage in each iteration.
Is this the case for you or are you seeing any other issue?

harryhar · January 12, 2020, 5:45am

Thank you for the reply! May post some code here, so that I can explain it more clearly.

net = net.cuda()
sum = 0

while True:
batch = 4
h = 3
w = 3
num_outputs = 5
x = torch.randn(batch,1, h, w).cuda()
y = net(x)
sum += y

So basically, I am just summing all the outputs, and the computation graph should be attched to the outputs, right? However, the GPU usage remains the same. I don’t understand why.

Thanks!

Huimin_ZENG · January 12, 2020, 5:47am

Thank you for the reply! May post some code here, so that I can explain it more clearly.

while True:
x = torch.randn(batch,1, h, w).cuda()
y = net(x)
sum += y

So basically, I am just summing all the outputs, and the computation graph should be attched to the outputs, right? However, the GPU usage remains the same. I don’t understand why.

Thanks!

ptrblck · January 12, 2020, 8:01am

Yes, this should be the case, if you didn’t wrap the code block in a with torch.no_grad() block.

Which model are you using? Also, how are you checking the memory usage? Could you print it using torch.cuda.memory_allocated() in the loop?

Huimin_ZENG · January 12, 2020, 7:39pm

Thank you for the instructions and sorry for the delayed reply.

I failed to reproduce the results I mentioned from the beginning. The usage of GPU is increasing now.

I am just using a really small toy model. And I keep typing command nvidia-smi to check the memory usage.

Thank you so much!

ptrblck · January 12, 2020, 11:25pm

PyTorch uses a custom caching memory allocator, which will try to reuse the device memory.
Thus nvidia-smi shows you the overall memory usage including the CUDA context, the allocated and the cached memory (also from other processes), which might show the increased memory usage after a couple of steps for small model.