GPU memory usage steadily increases in forward pass

MariosOreo · November 19, 2019, 3:24am

Hi,

I check GPU memory usage by torch.cuda.max_memory_allocated(), and it steadily increases in each forward pass. In the forward pass, it needs to store two intermediate tensors to calculate the next result like Fibonacci. So I think the problem may be in the way that I access these intermediate tensors.
My method:

def forward(self, x):
    ...
    intermediate_result = [] # a tensor list with length of 2, store results of i-1 and i-2
    for i in range(nb_loops):
        next_result = func(intermediate_result) # func is used to calculate next_result
        intermediate_result.pop()
        intermediate_result.append(next_result)
        ...

Could you give me some advice on how to maintain such intermediate_result in forward pass, and how to avoid GPU memory usage increasing problem?
Thanks in advance.

albanD · November 19, 2019, 3:01pm

Hi,

This looks like a good way to maintain them.
What you need to make sure is how big your computational graph is. In particular, if intermediate_result requires gradients, then next_result will also and will keep track of its history (which contains intermediate_result).

MariosOreo · November 20, 2019, 3:12am

Thanks for your reply!

What you mean is that if intermediate_result requires gradients, it will not only maintain two tensors (in intermediate_result currently), and it will also track all the preceding tensors in the computational graph, right? If so, the steadily growing memory usage the a normal case.

As you said, the memory usage increases in the first epoch, and it becomes stable in the following phase.

albanD · November 20, 2019, 2:18pm

Yes it will track all the previous ones.

Can you explain which part you want to backpropagate through?

Do you want to go all the way to the first iteration?
Do you want to backprop only the last func eval?
Do you want to backprop only the last two?

MariosOreo · November 28, 2019, 12:31pm

Sorry for late reply.

Yes, I want to go all the way to the first iteration, backprop to i_0 (i.e. input of the network). Additionally, during forward pass, in each iteration, the selection of intermediate feature i_k (i_k can have different size, that means it will not have a constant GPU memory consumption) based on Gumbel-Softmax, which also consumes additional GPU memory (I think).

So, in the forward pass, GPU memory checked by nvidia-smi, will sometimes increase, sometimes be stable, and it can also raise OOM anytime (maybe forward with large intermediate features?)

I want to know, how can I control the GPU memory usage? Does Gumbel-softmax of each iteration cause such growing memory usage? Could I use torch.cuda.empty_cache() after each epoch to release memory usage?

Thanks for you help!

albanD · December 1, 2019, 5:06pm

Hi,

If you need gradients all the way back to the first iteration, it is expected that the memory usage is going to be quite high as it needs to keep track of all the buffers to be able to compute the gradients.

Does Gumbel-softmax of each iteration cause such growing memory usage?

It is all the intermediary results that are required to compute the gradients that are expensive. You can try to use checkpoint to reduce the memory usage at the cost of more compute.

Could I use torch.cuda.empty_cache() after each epoch to release memory usage?

That won’t reduce the memory usage as the memory in the cache is reused to allocate more Tensor. The only thing this will do is slow down your code by forcing emptying and refilling the cache a lot.