I’m trying to generate multiple saliency maps for a model with multiple outputs and a single input image. I’m calling torch.autograd.grad once for each of the outputs as I want seperate gradients for each output.
My problem is that when I call torch.autograd.grad multiple times, each call increases the GPU memory usage. Here’s a code snippit:
I probably misunderstand what auto grad is doing, but to my understanding it should be using the computation graph to compute the gradiens and returning the gradients. Nothing new should be stored and since the graph is retained, nothing needs rebuilt. But clearly I’m wrong. Where is my thinking flawed and what is the right way to do this?
Since you append the result to a list at every iteration, it is expected that it uses more and more memory no?
Also no need to zero_grad when using autograd.grad indeed.
So, a simple model doesn’t reproduce the increase of memory at each call. The model that is causing the problem is quite complicated. It uses a CNN detector, hooks that store features which are ROIAligned later, and a several graph network calls. Any ideas on what in a model could cause incease in memory with the autograd call?
Okay, nevermind. There was an extra backwards hook being added in the saliency code I copied. Clearing the data fixed the memory issue.
Thanks for your help!