Memory leaks from custom function

I have a complicated custom autograd function that requires a couple mechanisms to work, both of which are causing memory leaks. I have simplified the function to essentially do nothing and it still causes the leaks just with the other mechanisms. The simplified function is the following:

class Tagger(torch.autograd.Function):
    @staticmethod
    def forward(ctx, inp, Values):
        ctx.savedValues = Values
        return inp
    @staticmethod
    def backward(ctx, grad_out):
        return grad_out*0, None

There are two things that I believe are causing issues. The first is that between forward and backward I add a new tensor to ‘Values’. Even when this tensor is unused by the empty function it still seems to cause leaks. The simplest example I have been able to do to cause a leak is with:

Values.newVariable = torch.zeros(1)

The second place that seems to cause problems is with setting ‘grad_out’ of backwards manually. I am doing this with a:

self.earlierOutputValue.register_hook(lambda grad: gradientValue)

I suspect it might be because I am assigning it to a self variable? But this is required since I am setting it with a value that is not calculated until after the forward pass.

Hi,

You can check the doc on creating custom Functions for more details: https://pytorch.org/docs/stable/notes/extending.html
But the gist is that if you need to save input/output of the forward, you have to use ctx.save_for_backward(Values) and get them back as Values, = ctx.saved_tensors otherwise you will see these leaks.

1 Like

Gotcha. In the process of converting that. Now I am getting the error “RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 10]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Uncaught exception. Entering post mortem debugging” Which I imagine is because that is exactly what I am doing. Is there a way to turn off this error or get around it? The value I need to use is not calculated until after the forward pass. It also does not require a gradient if that helps.
Edit: one of my other ones does require a gradient so that does not help. I do need to entirely modify one of these saved tensors, i.e. I need ctx.save_for_backwards to just save a pointer and not the actual tensor so I can change it later. (How to transition to functions not being allowed to have member variables) was working, just it is causing memory leaks that I am now seeing with a larger network.

So you willingly save Values in the forward, but that Tensor will be populated with the right value only later?
And does that Tensor require gradients?

1 Like

Im not 100% sure but I think I was mistaken. I should not need gradients. But yes, I do need a Tensor to be populated with the right value after the forward pass through my function.

In that case, if it doesn’t require gradients, I would just capture it to avoid this issue altogether: it is just a constant Tensor that the autograd doesn’t need to know about.

def tagger(inp, Values):
  class Tagger(torch.autograd.Function):
    @staticmethod
    def forward(ctx, inp):
      return inp
    @staticmethod
    def backward(ctx, grad_out):
      foo = Values
      return grad_out*0, None

  return Tagger.apply(inp)
1 Like

Hmm, that change does yield the same results and there are fewer additional tensors being added on each backwards pass but looks like there are still some memory leaks. I made the following change to the apply code.

#original
definedTagger=tagger()
...
applier = definedTagger.apply
outs = applier(outs, Values)
#new
outs = Tagger(outs, Values)

Is it possible things are being added when the tensors in Values are used? Maybe I need more detach() or other things like that?

Could you share a full code sample that I can run that fails? I think that will be faster :slight_smile:

Hey! In the process of doing that, among other things, I caught a place I needed to have a detach() where I did not. At this time it looks like the memory leak is gone, so the earlier suggestion worked! Thanks!

1 Like