I came across two different behaviors of tensor updating, and I don’t understand why they give different results, and in which case, one or the other is meant to be used.
Case 1 :
>>> a = torch.Tensor([1.0,2.5,-2.0])
>>> x = Variable(a, requires_grad=True)
>>> prev_x = x.data
>>> x
Variable containing:
1.0000
2.5000
-2.0000
[torch.FloatTensor of size 3]
>>> prev_x
Variable containing:
1.0000
2.5000
-2.0000
[torch.FloatTensor of size 3]
>>> x.data.sub_(1)
0.0000
1.5000
-3.0000
[torch.FloatTensor of size 3]
>>> prev_x
0.0000
1.5000
-3.0000
[torch.FloatTensor of size 3]
Case 2 :
>>> a = torch.Tensor([1.0,2.5,-2.0])
>>> x = Variable(a, requires_grad=True)
>>> prev_x = x.data
>>> x
Variable containing:
1.0000
2.5000
-2.0000
[torch.FloatTensor of size 3]
>>> prev_x
Variable containing:
1.0000
2.5000
-2.0000
[torch.FloatTensor of size 3]
>>> x.data = x.data - 1
>>> x
Variable containing:
0.0000
1.5000
-3.0000
[torch.FloatTensor of size 3]
>>> prev_x
1.0000
2.5000
-2.0000
[torch.FloatTensor of size 3]
First of all, I understand that in order to make a copy of a tensor, I should write prev_x = x.data.clone()
.
But I do not understand why the default behavior when we do prev_x = x.data
is referencing and not copying, l̶i̶k̶e̶ ̶i̶n̶ ̶N̶u̶m̶p̶y̶ ̶(̶A̶F̶A̶I̶U̶,̶ prev_x
̶a̶n̶d̶ x.data
̶a̶r̶e̶ ̶b̶o̶t̶h̶ ̶t̶e̶n̶s̶o̶r̶s̶)̶.̶ (EDIT See my answer below)
Second, in case 1, the .sub_()
method indicates with the “_” suffix that it is done in-place.
It is natural then that the x is changed, and since prev_x = x.data
is referencing, prev_x is changed too.
But in case 2, I update directly x.data
, with something I would consider “in-place” : x.data = x.data - 1
, but this time, prev_x
doesn’t change.
Could someone explain that behavior ?
I know that we shouldn’t use in-place methods (“Supporting in-place operations in autograd is a hard matter, and we discourage their use in most cases”), but I don’t want to use them, I found it in the update step of the Adadelta optimizer, and that is what lead me to stumble on this.
Another thing that is unclear to me, is when to use the different notations for basic operations, meaning:
- Actual symbols +,-,x,/ e.g.
a = a + b
- Not in-place operators
add()
,sub()
e.g.a = a.add(b)
- In-place operators
add_()
,sub_()
e.g.a.add_(b)
Thank you !