What is the recommended way to re-assign/update values in a variable (or tensor)?

why is the following:

w.data = w.data - eta*w.grad.data


w= w- eta*w.grad

is not recommended?

I actually got from the official tutorials:


for param in model.parameters():
    param.data -= learning_rate * param.grad.data


w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data

Note that there is a difference from doing

param.data -= learning_rate * param.grad.data


param.data = param.data - learning_rate * param.grad.data

In the first case, the operation is performed in-place, so the python object is still the same, while in the second one you create a new object.

To give an example:

a = torch.rand(3)
b = a  # same python object
b = b - 1
print(id(b)) # object reference changed!
a -= 1  # in-place operation doesn't change object
print(id(a))  # still the same object

Thanks for the help! Just for completeness I will try to address my question with the best best solution I know so far:


not sure if this is good or if there are advantages and disadvantages to it but Im going to leave it here for future people to benefit (and or discuss).

I guess its a little sad that:

W = W - eta*W.grad

doesn’t work cuz now it looks less like maths and a bit harder to read but eh, Im being a bit pedantic…


now this is just me being curious, is the fact that x=x+x re-assigns namespace ids vs x+=x does inplace, a feature of python or a feature of pytorch? Like could x=x+x been made equivalent to x+=x if the developers of pytorch wanted? Just curious.

sorry for being so insistent. But right now I am just immensely curious why:

W.data = W.data - eta*W.grad.data

would be a bad idea. It seems it works fine when I try it (maybe cuz its not in place?)

If you were to do W = W - eta * W.grad, you would be still storing the history of the computations for the update, which is not normally what you want to do.

1 Like

This is something from python, not specific to pytorch. The same is present in numpy.

1 Like

you mean if u did:

W.data = W.data - eta*W.grad.data

if you did W = W - eta * W.grad that would create no variables so the previous history would be thrown away, no?

1 Like

The fact that it works fine is a feature (as mentioned by @apaszke in slack), but there are reasons why it wouldn’t necessarily be the case.
W is a Variable that holds a tensor in W.data. Now, what happens if you change the tensor that W originally points to, by doing W.data = new_tensor? W should now point to new_tensor, but W is a Variable that was supposed to represent the original tensor present.

1 Like

sorry last question. Whats wrong with storing the history of the computations? (you also mentioned that in the slack didn’t quite catch what was wrong with that)

If you store indefinitely the history of computations, your computation graph will grow bigger at every iteration, and you will never free memory, leading to out of memory issues.

1 Like

I want to add for anyone getting to this place that the code

doesn’t work, since Tensors do not have a copy method.

Instead, use :

Notice the underscore !

I have a similar question. My model is trained I want to multiply the values of a particular conv layer how do that during testing?