Variable and tensor update behavior

I came across two different behaviors of tensor updating, and I don’t understand why they give different results, and in which case, one or the other is meant to be used.

Case 1 :

>>> a = torch.Tensor([1.0,2.5,-2.0])
>>> x = Variable(a, requires_grad=True)
>>> prev_x = x.data

>>> x
Variable containing:
 1.0000
 2.5000
-2.0000
[torch.FloatTensor of size 3]

>>> prev_x
Variable containing:
 1.0000
 2.5000
-2.0000
[torch.FloatTensor of size 3]

>>> x.data.sub_(1)
 0.0000
 1.5000
-3.0000
[torch.FloatTensor of size 3]

>>> prev_x
 0.0000
 1.5000
-3.0000
[torch.FloatTensor of size 3]

Case 2 :

>>> a = torch.Tensor([1.0,2.5,-2.0])
>>> x = Variable(a, requires_grad=True)
>>> prev_x = x.data

>>> x
Variable containing:
 1.0000
 2.5000
-2.0000
[torch.FloatTensor of size 3]

>>> prev_x
Variable containing:
 1.0000
 2.5000
-2.0000
[torch.FloatTensor of size 3]

>>> x.data = x.data - 1

>>> x
Variable containing:
 0.0000
 1.5000
-3.0000
[torch.FloatTensor of size 3]

>>> prev_x
 1.0000
 2.5000
-2.0000
[torch.FloatTensor of size 3]

First of all, I understand that in order to make a copy of a tensor, I should write prev_x = x.data.clone().
But I do not understand why the default behavior when we do prev_x = x.data is referencing and not copying, l̶i̶k̶e̶ ̶i̶n̶ ̶N̶u̶m̶p̶y̶ ̶(̶A̶F̶A̶I̶U̶,̶ prev_x ̶a̶n̶d̶ x.data ̶a̶r̶e̶ ̶b̶o̶t̶h̶ ̶t̶e̶n̶s̶o̶r̶s̶)̶.̶ (EDIT See my answer below)

Second, in case 1, the .sub_() method indicates with the “_” suffix that it is done in-place.
It is natural then that the x is changed, and since prev_x = x.data is referencing, prev_x is changed too.

But in case 2, I update directly x.data, with something I would consider “in-place” : x.data = x.data - 1, but this time, prev_x doesn’t change.

Could someone explain that behavior ?

I know that we shouldn’t use in-place methods (“Supporting in-place operations in autograd is a hard matter, and we discourage their use in most cases”), but I don’t want to use them, I found it in the update step of the Adadelta optimizer, and that is what lead me to stumble on this.

Another thing that is unclear to me, is when to use the different notations for basic operations, meaning:

  • Actual symbols +,-,x,/ e.g. a = a + b
  • Not in-place operators add(),sub() e.g. a = a.add(b)
  • In-place operators add_(),sub_() e.g. a.add_(b)

Thank you !

1 Like

Here’s a response to your second question:
If you look here: https://github.com/pytorch/pytorch/blob/fa8044d92f8a77b8008ca5e295a341abb9d26f13/torch/tensor.py#L292 ,

The actual symbols (+ - * / etc) and their equivalent operators (add(), sub(), etc) do the same thing.
i.e. a + b calls a.sum(b) under the hood (look at the __add__ method),
and a += 1 calls a.add_(1) (look at the __iadd__ method).

And here’s a reply to your first question:

I think this is what is happening under the hood.

x.data has a pointer to data somewhere (via x.data.data_ptr() or x.data.storage()). Let’s call that data S for storage.
When you do prev_x = x.data, prev_x is being created with a pointer to S as well.
So both prev_x and x.data use S as their storage under the hood.

When you modify x.data in-place via x.data.add_(...), it updates S, so you see the changes in prev_x.
When you modify x.data = x.data + 1, you’re creating a new storage S' for x.data.

The reason behind this is that this is equivalent to x.data = x.data.add(1).
x.data.add(1) creates a tensor with a new storage S' (because it is not in-place), which is then being assigned to x.data.
Meanwhile x_prev still uses S as the underlying storage.

Thank you for your answers !
I just tried to do x.data += 1 and this modifies x.data in-place.
I am surprised to see that x.data += 1 is not equivalent to x.data = x.data + 1.
But I guess that is inherent to Python and not Pytorch as seen here
The default behavior in Python is always referencing, it’s just that x += 1 modifies in-place and x = x + 1 doesn’t.
So I was wrong in my question, saying that

default behavior when we do prev_x = x.data is referencing and not copying, like in Numpy

Yeah, it’s a python thing that x.data += 1 isn’t equivalent to x.data = x.data + 1 based on how the __add__ (+) and __iadd__ (+=) operators work