Hello, I would appreciate if somebody could confirm that my understanding about autograd and inplace operations is correct and my code does not run into issues.
I train a transformer model based on DETR. When I sample tensors from my dataset and do preprocessing, they have: requires_grad=False
I do not change that field before feeding this data to the PyTorch transformer and assume that is fine. I understand that when the samples get processed in the transformer, the results automatically have requires_grad = True as the network weights probably are configured that way.
During my pre-processing and also during post-processing I currently use in-place operations, e.g. tensor[b ,idx, 0] = 2 * ((tensor[b, idx, 0] - x_min) / (x_max - x_min)) - 1
I assume this is fine in pre-processing, as the grad flag is still false. During post-processing I want to change my network output before calculating the loss function. Here I figured, that I need to clone my output tensor first before reverting the inplace operation above. I then want to write the changed tensor values into the original tensor. Does this work fine with autograd?
Or do I need to find some workaround to avoid the slicing and inplace operation by for example creating a zero tensor and filling it with the new values, to then use e.g. torch.where() to merge it with the original tensor.
Generally, can I not use slicing when using autograd? E.g.:
tensor[:, 0] = 5
or tensor[mask] = 5
It seems very unintuitive to me to then always create a zero tensor with e.g. the value 5 and then adding it to the original one.
This leads me to my current understanding of autograd:
use clone() when I want to do inplace operations on my tensor with grad history, which I want to keep. I can also assign my cloned tensor to the original one, as it has the same grad history.
use detach().clone() when I want to have a copy of my tensor that uses new memory and has no grad history
inplace operations without clone() can cause issues with autograd and my grad results could be wrong even though there is no grad error
Let me make some comments that are relevant to your questions:
Inplace operations can cause “inplace-modification” errors during
backpropagation, but they don’t necessarily cause such errors.
The following post of mine tries to explain what is going on with some
Note, your “grad results” won’t be wrong because if they would have been
wrong, autograd will raise a “grad error.”
Tensors without a “grad history” (that is, with requires_grad = False)
are not immune to inplace-modification errors. A tensor which is not
having its own gradient computed can still be used in the computation
of some other tensor. Consider:
>>> import torch
>>> a = torch.tensor () # a tensor with requires_grad = False
>>> t = torch.ones (1, requires_grad = True)
>>> l = a * t
>>> t.grad = None
>>> l = a * t
>>> a = 99 # modify inplace
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<path_to_pytorch_install>\torch\_tensor.py", line 487, in backward
File "<path_to_pytorch_install>\torch\autograd\__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.LongTensor ] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Yes, if you need to modify a tensor inplace and doing so causes an
inplace-modification error during backpropagation, then cloning the
tensor before modifying it is often an appropriate solution.
thank you for your answer and examples, it helped a lot.
I take the following away from this, please correct me if I am wrong:
If PyTorch does not throw an error regarding inplace-modifications, there is nothing for me to worry about. Even if I use inplace-operations.
When an inplace-error does happen I can circumvent it by cloning the tensor, applying my in-place operations and then writing the cloned and modified tensor to the variable that used contain my original tensor, without messing anything up regarding the gradient. Just like I do it in my example.
It is generally a good style in PyTorch to prevent the usage of in-place operations for tensors with “requires-grad = True”. Better use more code lines and use e.g. torch.where() to replace values in tensors.
Correct. But to clarify, you have a named “variable” in your python script
that refers (as a python reference) to some tensor that autograd needs
(and autograd also keeps its own reference to this tensor). You clone
the tensor and your reference (the named “variable”) and autograd’s
reference both still refer to the original tensor. When you “write the cloned
and modified tensor to the variable” you are telling the python interpreter
to reuse that name and set it to now refer to a different tensor – the cloned
and modified tensor. (Note, you could have set that “variable” to refer, for
example, to some python list that has nothing to do with pytorch.)
Setting your named “variable” to refer to something else (or, for that matter,
to None) has no effect on the original tensor nor on autograd’s reference to
the original tensor.
It’s up to you whether you want to use a new variable name or reuse the
original variable name to refer to the cloned and modified tensor – it doesn’t
affect the logic of the code. I would make that choice based on what makes
the code the most readable.
I wouldn’t necessarily agree. Creating a new tensor to avoid an inplace
operation costs memory. Furthermore, if your cloned and modified tensor will
be part of the backward pass, then it too will carry requires_grad = True.
So your rule would have prevented you from modifying it inplace.
My general approach is to go ahead and use inplace modifications (where
appropriate) unless I think it will lead to an inplace-modification error during
the backward pass. If I overlook such an inplace-modification error, autograd
warns me and I go back and fix it.
torch.where() is a useful tool, but be aware that it has a booby-trap in it.
If the “path not taken” in the torch.where() call has nans in it, those nans
will pollute the result of torch.where(), even though they don’t logically