I have a complicated custom autograd function that requires a couple mechanisms to work, both of which are causing memory leaks. I have simplified the function to essentially do nothing and it still causes the leaks just with the other mechanisms. The simplified function is the following:
There are two things that I believe are causing issues. The first is that between forward and backward I add a new tensor to âValuesâ. Even when this tensor is unused by the empty function it still seems to cause leaks. The simplest example I have been able to do to cause a leak is with:
Values.newVariable = torch.zeros(1)
The second place that seems to cause problems is with setting âgrad_outâ of backwards manually. I am doing this with a:
I suspect it might be because I am assigning it to a self variable? But this is required since I am setting it with a value that is not calculated until after the forward pass.
You can check the doc on creating custom Functions for more details: https://pytorch.org/docs/stable/notes/extending.html
But the gist is that if you need to save input/output of the forward, you have to use ctx.save_for_backward(Values) and get them back as Values, = ctx.saved_tensors otherwise you will see these leaks.
Gotcha. In the process of converting that. Now I am getting the error âRuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 10]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Uncaught exception. Entering post mortem debuggingâ Which I imagine is because that is exactly what I am doing. Is there a way to turn off this error or get around it? The value I need to use is not calculated until after the forward pass. It also does not require a gradient if that helps.
Edit: one of my other ones does require a gradient so that does not help. I do need to entirely modify one of these saved tensors, i.e. I need ctx.save_for_backwards to just save a pointer and not the actual tensor so I can change it later. (How to transition to functions not being allowed to have member variables) was working, just it is causing memory leaks that I am now seeing with a larger network.
So you willingly save Values in the forward, but that Tensor will be populated with the right value only later?
And does that Tensor require gradients?
Im not 100% sure but I think I was mistaken. I should not need gradients. But yes, I do need a Tensor to be populated with the right value after the forward pass through my function.
In that case, if it doesnât require gradients, I would just capture it to avoid this issue altogether: it is just a constant Tensor that the autograd doesnât need to know about.
Hmm, that change does yield the same results and there are fewer additional tensors being added on each backwards pass but looks like there are still some memory leaks. I made the following change to the apply code.
Hey! In the process of doing that, among other things, I caught a place I needed to have a detach() where I did not. At this time it looks like the memory leak is gone, so the earlier suggestion worked! Thanks!