Multiple calls to autograd.grad with same graph increases memory usage

Brian_Davis · December 3, 2020, 12:33am

I’m trying to generate multiple saliency maps for a model with multiple outputs and a single input image. I’m calling torch.autograd.grad once for each of the outputs as I want seperate gradients for each output.

My problem is that when I call torch.autograd.grad multiple times, each call increases the GPU memory usage. Here’s a code snippit:

    input_gradients = []
    for output_scalar in multiple_output_scalars:
        self.model.zero_grad() #not needed right?
        input_gradients.append( torch.autograd.grad(
            outputs = output_scalar, 
            inputs = x, 
            retain_graph=True,
            create_graph=False)[0].cpu().detach() )
    input_gradients = torch.cat(input_gradients,dim=0)

I probably misunderstand what auto grad is doing, but to my understanding it should be using the computation graph to compute the gradiens and returning the gradients. Nothing new should be stored and since the graph is retained, nothing needs rebuilt. But clearly I’m wrong. Where is my thinking flawed and what is the right way to do this?

albanD · December 3, 2020, 3:06pm

Hi,

Since you append the result to a list at every iteration, it is expected that it uses more and more memory no?
Also no need to zero_grad when using autograd.grad indeed.

Brian_Davis · December 3, 2020, 7:57pm

Sorry, I should clarify. It’s GPU memory that’s growing, while I store the gradients on the CPU.

albanD · December 3, 2020, 8:10pm

Do you actually run out of memory? It might just be the allocator that takes more and more from the driver but does not actually use it.

Brian_Davis · December 3, 2020, 8:13pm

I do run out of GPU memory

albanD · December 3, 2020, 8:25pm

Ok,

Could you provide a small code snippet that we could run (on colab for example) that shows the issue please?

Brian_Davis · December 3, 2020, 8:46pm

So, a simple model doesn’t reproduce the increase of memory at each call. The model that is causing the problem is quite complicated. It uses a CNN detector, hooks that store features which are ROIAligned later, and a several graph network calls. Any ideas on what in a model could cause incease in memory with the autograd call?

Brian_Davis · December 3, 2020, 9:34pm

So removing the graph network stuff doesn’t change the memory error. This colab is my attempt to reproduce it, but it runs without any memory incrase: https://colab.research.google.com/drive/1IiVFdB2hUttfgrpgwCnOXYmxKN1-43S0?usp=sharing

albanD · December 3, 2020, 9:35pm

Hard to say without code.
But of your hooks store stuff. That sounds dangerous

Brian_Davis · December 3, 2020, 9:36pm

There are only forward hooks, and they’d overwrite the last stored data.

Brian_Davis · December 3, 2020, 9:42pm

Okay, nevermind. There was an extra backwards hook being added in the saliency code I copied. Clearing the data fixed the memory issue.
Thanks for your help!