Multiple calls to autograd.grad with same graph increases memory usage

I’m trying to generate multiple saliency maps for a model with multiple outputs and a single input image. I’m calling torch.autograd.grad once for each of the outputs as I want seperate gradients for each output.

My problem is that when I call torch.autograd.grad multiple times, each call increases the GPU memory usage. Here’s a code snippit:

    input_gradients = []
    for output_scalar in multiple_output_scalars:
        self.model.zero_grad() #not needed right?
        input_gradients.append( torch.autograd.grad(
            outputs = output_scalar, 
            inputs = x, 
            retain_graph=True,
            create_graph=False)[0].cpu().detach() )
    input_gradients = torch.cat(input_gradients,dim=0)

I probably misunderstand what auto grad is doing, but to my understanding it should be using the computation graph to compute the gradiens and returning the gradients. Nothing new should be stored and since the graph is retained, nothing needs rebuilt. But clearly I’m wrong. Where is my thinking flawed and what is the right way to do this?

Hi,

Since you append the result to a list at every iteration, it is expected that it uses more and more memory no?
Also no need to zero_grad when using autograd.grad indeed.

Sorry, I should clarify. It’s GPU memory that’s growing, while I store the gradients on the CPU.

Do you actually run out of memory? It might just be the allocator that takes more and more from the driver but does not actually use it.

I do run out of GPU memory

Ok,

Could you provide a small code snippet that we could run (on colab for example) that shows the issue please?

So, a simple model doesn’t reproduce the increase of memory at each call. The model that is causing the problem is quite complicated. It uses a CNN detector, hooks that store features which are ROIAligned later, and a several graph network calls. Any ideas on what in a model could cause incease in memory with the autograd call?

So removing the graph network stuff doesn’t change the memory error. This colab is my attempt to reproduce it, but it runs without any memory incrase: https://colab.research.google.com/drive/1IiVFdB2hUttfgrpgwCnOXYmxKN1-43S0?usp=sharing

Hard to say without code.
But of your hooks store stuff. That sounds dangerous :smiley:

There are only forward hooks, and they’d overwrite the last stored data.

Okay, nevermind. There was an extra backwards hook being added in the saliency code I copied. Clearing the data fixed the memory issue.
Thanks for your help!

1 Like