I came across the term in-place operation
in the
documentation http://pytorch.org/docs/master/notes/autograd.html What does it mean?
Hi,
An in-place operation is an operation that changes directly the content of a given Tensor without making a copy. Inplace operations in pytorch are always postfixed with a _
, like .add_()
or .scatter_()
. Python operations like +=
or *=
are also inplace operations.
I initially found in-place operations in the following PyTorch tutorial:
Adding two tensors
import torch
>>> x = torch.rand(1)
>>> x
0.2362
[torch.FloatTensor of size 1]
>>> y = torch.rand(1)
>>> y
0.7030
[torch.FloatTensor of size 1]
Normal addition
# Addition of two tensors creates a new tensor.
>>> x + y
0.9392
[torch.FloatTensor of size 1]
# The value of x is unchanged.
>>> x
0.2362
[torch.FloatTensor of size 1]
In-place addition
# An in-place addition modifies one of the tensors itself, here the value of x.
>>> x.add_(y)
0.9392
[torch.FloatTensor of size 1]
>>> x
0.9392
[torch.FloatTensor of size 1]
So in this tutorial, the way the network constructed in the forward method is in-place operation right?
http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py
I understand that x.add_(y)
is an in-place operation.
Is x = x + y
in-place and will it cause any problem for autograd?
Hi,
No x = x + y
is not inplace. x += y
is inplace.
What is the difference? they both will modify x?
Yes true they both modify x. But in-place operation does not allocate new memory for x.
Eg. Normal operation vs In place operation
>>> x = torch.rand(1)
>>> y = torch.rand(1)
>>> x
tensor([0.2738])
>>> id(x)
140736259305336
>>> x = x + y # Normal operation
>>> id(x)
140726604827672 # New location
>>> x += y
>>> id(x)
140726604827672 # Existing location used (in-place)
Thanks, that make sense
Thanks, this is a good explanation.
Which one is faster? in-place or normal operation?
That depends what you mean by faster
It does the same amount of computations. So that does not change that.
But since there is less memory accesses, this can lead to speed up if your task is bound by memory bandwidth (which is quite often the case on GPU).
Are in-place operations added to the computation graph/tracked by autograd?
This experiment makes me think yes since a_clone
has <MulBackward0>
as meta-data.
Let me know if this conclusion is correct:
def clone_playground():
import torch
a = torch.tensor([1,2,3.], requires_grad=True)
a_clone = a.clone()
print(f'a is a_clone = {a is a_clone}')
print(f'a == a_clone = {a == a_clone}')
print(f'a = {a}')
print(f'a_clone = {a_clone}')
#a_clone.fill_(2)
a_clone.mul_(2)
print(f'a = {a}')
print(f'a_clone = {a_clone}')
a_clone.sum().backward()
output:
a is a_clone = False
a == a_clone = tensor([True, True, True])
a = tensor([1., 2., 3.], requires_grad=True)
a_clone = tensor([1., 2., 3.], grad_fn=<CloneBackward>)
a = tensor([1., 2., 3.], requires_grad=True)
a_clone = tensor([2., 4., 6.], grad_fn=<MulBackward0>)
from here: Clone and detach in v0.4.0
I’ll answer there to keep all the discussion in a single place.
Made a summary here. Hope it is helpful:
Best practice: Avoid inplace operations if it is not necessary as it changes the state of tensors silently. Non-inplace operations will make a copy before doing the operation. Thus, if an operation is inplace within a function, it affects the tensor’s state outside of the function while the non-inplace operation does not change the state unless you reassign it outside of the function.
e.g.
def inplace_op(X):
X += 1
return X
X = torch.rand(4, 2)
inplace_op(X) # X is changed without re-asigned to X
Summary of inplace operations:
- x *= 3
- X[…] = …
- X.add_(1)
Common examples of inplace operations:
- x += 1, x *= 3, …
- x[2] = 2, X[0, 0] = 2
- x[:, 3] = 3
- X[2] /= 5
- X[:, 3] /= 4
- X[:, 3] = X[:, 3] / 4
These are not inplace operations:
- x = x + 1
- y = x.clone; y[0] += 100
Your examples saved my days of debugging. Thnx