What is the recommended way to re-assign/update values in a variable (or tensor)?

Brando_Miranda · August 11, 2017, 7:04pm

why is the following:

w.data = w.data - eta*w.grad.data

or

w= w- eta*w.grad

is not recommended?

I actually got from the official tutorials:

http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-nn

for param in model.parameters():
    param.data -= learning_rate * param.grad.data
and

Or

w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data

fmassa · August 11, 2017, 8:49pm

Note that there is a difference from doing

param.data -= learning_rate * param.grad.data

and

param.data = param.data - learning_rate * param.grad.data

In the first case, the operation is performed in-place, so the python object is still the same, while in the second one you create a new object.

To give an example:

a = torch.rand(3)
print(id(a))
b = a  # same python object
print(id(b))
b = b - 1
print(id(b)) # object reference changed!
a -= 1  # in-place operation doesn't change object
print(id(a))  # still the same object

Brando_Miranda · August 11, 2017, 11:12pm

Thanks for the help! Just for completeness I will try to address my question with the best best solution I know so far:

W.data.copy(new_value.data)

not sure if this is good or if there are advantages and disadvantages to it but Im going to leave it here for future people to benefit (and or discuss).

I guess its a little sad that:

W = W - eta*W.grad

doesn’t work cuz now it looks less like maths and a bit harder to read but eh, Im being a bit pedantic…

Brando_Miranda · August 11, 2017, 11:18pm

now this is just me being curious, is the fact that x=x+x re-assigns namespace ids vs x+=x does inplace, a feature of python or a feature of pytorch? Like could x=x+x been made equivalent to x+=x if the developers of pytorch wanted? Just curious.

Brando_Miranda · August 11, 2017, 11:32pm

sorry for being so insistent. But right now I am just immensely curious why:

W.data = W.data - eta*W.grad.data

would be a bad idea. It seems it works fine when I try it (maybe cuz its not in place?)

fmassa · August 11, 2017, 11:42pm

If you were to do W = W - eta * W.grad, you would be still storing the history of the computations for the update, which is not normally what you want to do.

fmassa · August 11, 2017, 11:42pm

This is something from python, not specific to pytorch. The same is present in numpy.

Brando_Miranda · August 11, 2017, 11:44pm

you mean if u did:

W.data = W.data - eta*W.grad.data

if you did W = W - eta * W.grad that would create no variables so the previous history would be thrown away, no?

fmassa · August 11, 2017, 11:46pm

The fact that it works fine is a feature (as mentioned by @apaszke in slack), but there are reasons why it wouldn’t necessarily be the case.
W is a Variable that holds a tensor in W.data. Now, what happens if you change the tensor that W originally points to, by doing W.data = new_tensor? W should now point to new_tensor, but W is a Variable that was supposed to represent the original tensor present.

Brando_Miranda · August 11, 2017, 11:59pm

sorry last question. Whats wrong with storing the history of the computations? (you also mentioned that in the slack didn’t quite catch what was wrong with that)

fmassa · August 12, 2017, 11:17am

If you store indefinitely the history of computations, your computation graph will grow bigger at every iteration, and you will never free memory, leading to out of memory issues.

alexandre-blanc · September 30, 2021, 3:58pm

I want to add for anyone getting to this place that the code

doesn’t work, since Tensors do not have a copy method.

Instead, use :
W.data.copy_(new_value.data)

Notice the underscore !

mahmood · July 26, 2022, 2:33am

I have a similar question. My model is trained I want to multiply the values of a particular conv layer how do that during testing?