Gpu memory gets accumulated during consecutive forward passes

I am running a densenet161 image classifier on gpu.

After continuous forward passes(basically 2-5 batches of images, batch_size=128), the memory gets accumulated and doesn’t get freed. Can someone tell me what is taking up so much memory?
Btw, I am using torch.no_grad() during forward pass.

@ptrblck It would also be helpful if you can tell the factors which determine memory usage during inference.

Could you post a snippet of the code where you do the forward pass, so that we can take a look?

I am using an Inference class and the following code is one of its member function. Every batch is of size 128.

def predict(image_objs_batches):
    for batch in image_objs_batches:
        with torch.no_grad():
            _, outputs = self.model(batch)

Are you sure the memory leak is on the inference stage? If not could you post the training loop. If you accumulate the losses on a list or something similar without calling .item() on it the gradients will accumulate.

I am facing this memory accumulation issue only during inference. Not for training.

Hi there,

_, outputs = self.model(batch) is not within the context manager as per your snippet.
Put a tab on the line to push it within the torch.no_grad() context manager.

Since you asked the other factors, one could be the batch size, in your case this might mot be the solution, but increasing the batch size means computing and keeping more gradients on your device, eventually eating up device memory. This is keeping in mind that you had a pretrained model and did not trained on your device, you are only inferencing.


Yes, there is a tab. Somehow got rearranged while copying code. Inference is being done with torch.no_grad() context manager.

I am pretty sure gradients are not computed, but gpu memory steadily increases during consecutive forward passes.

Hey, any update on this? I am facing the same issue, even with the with torch.no_grad() snippet, with the latest version 1.9.

Could you post a minimal and executable code snippet, which would show the increase in memory in PyTorch 1.9.0 using the no_grad guard, please?

1 Like

Hi, @ptrblck thanks for your response. I figured out that the issue is not related to the forward pass, but rather PyTorch was caching the memory, and therefore I saw an increased memory footprint on nvidia-smi. I have figured out the solution. Thanks once again.