How to modify gradients in-place?

I am working on an unusual research project and need to modify some gradients in-place.

This worked fine so far, but I just updated to a newer version of Pytorch, and now I am getting this error:

“RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [762, 5]], which is output 0 of TBackward, is at version 5; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!”

I get that this is an error for almost every usecase, but I actually want to do this on purpose.

Is there a way to turn this error off?

Alternatively, is there another way to modify gradients that is legal / preferred?

(My situation: I’m running a network where the architecture is not fixed but changes based on intermediate results. To backpropagate through this safely, I break the network into smaller parts, and manually retrieve and re-insert the gradients at the appropriate tensors.)

1 Like

Would your workflow work with hooks (via tensor.register_hook()) to manipulate the gradients?
If not, could you try to wrap the gradient manipulations into a with torch.no_grad() block, if that’s not already the case?

tensor.register_hook() would have worked, but I would have to refactor everything. It would take days, and I might lose some functionality, because the system I already built is more flexible for my purposes. (Hooks get executed immediately, but I build a prioritized queue of calculations that still need to be performed)

I’m not sure I understand how torch.no_grad() would help. I am performing normal gradient calculations through backpropagation. Then I apply modifications to the generated gradients.

Note that

  1. the error message comes from modifying something in-place that has been used in the forward and is needed for the backward. You might check the backtrace mentioned here: (Note that it typically is in C++ stderr and not shown in Jupyter unless you use nightlies when it is in Python stderr.)

Hint: the backtrace further above shows the operation that failed to compute its gradient.

  1. If for the type of thing you are doing, I would probably not use backward for the intermediates but use torch.autograd.grad. Basically, backward is best thought of as existing for the very narrow case of “compute gradients then fed into the optimizer”.

Best regards

Thomas

1 Like

That’s interesting. I didn’t know there was a difference between torch.autograd.grad and torch.autograd.backward. It looks like the first one is functional, while the second has side-effects. Is that correct? Is there anything you would recommend to read to understand this better? I’m worried I might break something if I make a switch at this point without fully understanding everything.

Yeah, so basically, the difference is “only” functional vs. side-effects.
For optimizers, you need to have .backward as .grad is what they consume.
But for other purposes it would seem that autograd.grad makes is clear that the result is yours.
Probably a very careful reading of the documentation (the Autograd Mechanics note and torch.autograd come to mind).

Best regards

Thomas

1 Like

Thanks! That’s good to know. Now I just need to figure out if that’s worth all the refactoring…

Quick update: In the meantime, I have noticed that my code contained a bug. I tried to apply a loss to a calculation after some parameters used in it had already been modified by another calculation. This is what caused the error message. The question I asked is still valid, and the answers are still useful. But just in case somebody wonders, the reason for the error message was not the in-place modification of gradients.