What is the recommended way to re-assign/update values in a variable (or tensor)?


(Brando Miranda) #1

why is the following:

w.data = w.data - eta*w.grad.data

or

w= w- eta*w.grad

is not recommended?

I actually got from the official tutorials:

http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-nn

for param in model.parameters():
    param.data -= learning_rate * param.grad.data
and

Or

w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data

(Francisco Massa) #2

Note that there is a difference from doing

param.data -= learning_rate * param.grad.data

and

param.data = param.data - learning_rate * param.grad.data

In the first case, the operation is performed in-place, so the python object is still the same, while in the second one you create a new object.

To give an example:

a = torch.rand(3)
print(id(a))
b = a  # same python object
print(id(b))
b = b - 1
print(id(b)) # object reference changed!
a -= 1  # in-place operation doesn't change object
print(id(a))  # still the same object

How does one make sure that the parameters are update manually in pytorch using modules?
(Brando Miranda) #3

Thanks for the help! Just for completeness I will try to address my question with the best best solution I know so far:

W.data.copy(new_value.data)

not sure if this is good or if there are advantages and disadvantages to it but Im going to leave it here for future people to benefit (and or discuss).


I guess its a little sad that:

W = W - eta*W.grad

doesn’t work cuz now it looks less like maths and a bit harder to read but eh, Im being a bit pedantic…


(Brando Miranda) #4

now this is just me being curious, is the fact that x=x+x re-assigns namespace ids vs x+=x does inplace, a feature of python or a feature of pytorch? Like could x=x+x been made equivalent to x+=x if the developers of pytorch wanted? Just curious.


(Brando Miranda) #5

sorry for being so insistent. But right now I am just immensely curious why:

W.data = W.data - eta*W.grad.data

would be a bad idea. It seems it works fine when I try it (maybe cuz its not in place?)


(Francisco Massa) #6

If you were to do W = W - eta * W.grad, you would be still storing the history of the computations for the update, which is not normally what you want to do.


(Francisco Massa) #7

This is something from python, not specific to pytorch. The same is present in numpy.


(Brando Miranda) #8

you mean if u did:

W.data = W.data - eta*W.grad.data

if you did W = W - eta * W.grad that would create no variables so the previous history would be thrown away, no?


(Francisco Massa) #9

The fact that it works fine is a feature (as mentioned by @apaszke in slack), but there are reasons why it wouldn’t necessarily be the case.
W is a Variable that holds a tensor in W.data. Now, what happens if you change the tensor that W originally points to, by doing W.data = new_tensor? W should now point to new_tensor, but W is a Variable that was supposed to represent the original tensor present.


(Brando Miranda) #11

sorry last question. Whats wrong with storing the history of the computations? (you also mentioned that in the slack didn’t quite catch what was wrong with that)


(Francisco Massa) #12

If you store indefinitely the history of computations, your computation graph will grow bigger at every iteration, and you will never free memory, leading to out of memory issues.


Theano.clone feature for advanced HMC