In the first case, the operation is performed in-place, so the python object is still the same, while in the second one you create a new object.
To give an example:
a = torch.rand(3)
print(id(a))
b = a # same python object
print(id(b))
b = b - 1
print(id(b)) # object reference changed!
a -= 1 # in-place operation doesn't change object
print(id(a)) # still the same object
Thanks for the help! Just for completeness I will try to address my question with the best best solution I know so far:
W.data.copy(new_value.data)
not sure if this is good or if there are advantages and disadvantages to it but Im going to leave it here for future people to benefit (and or discuss).
I guess its a little sad that:
W = W - eta*W.grad
doesn’t work cuz now it looks less like maths and a bit harder to read but eh, Im being a bit pedantic…
now this is just me being curious, is the fact that x=x+x re-assigns namespace ids vs x+=x does inplace, a feature of python or a feature of pytorch? Like could x=x+x been made equivalent to x+=x if the developers of pytorch wanted? Just curious.
If you were to do W = W - eta * W.grad, you would be still storing the history of the computations for the update, which is not normally what you want to do.
The fact that it works fine is a feature (as mentioned by @apaszke in slack), but there are reasons why it wouldn’t necessarily be the case. W is a Variable that holds a tensor in W.data. Now, what happens if you change the tensor that W originally points to, by doing W.data = new_tensor? W should now point to new_tensor, but W is a Variable that was supposed to represent the original tensor present.
sorry last question. Whats wrong with storing the history of the computations? (you also mentioned that in the slack didn’t quite catch what was wrong with that)
If you store indefinitely the history of computations, your computation graph will grow bigger at every iteration, and you will never free memory, leading to out of memory issues.