I’m a newbie of pytorch and I like it a lot! But I have a problem with gradient of re-used variable
here is my code:
import torch as t
from torch.autograd import Variable as v
x0 = t.FloatTensor([2, 1]).view(1, 2)
x0 = v(x0, requires_grad=True)
for _ in range(step):
x2 = t.FloatTensor([[1, 2], [1, 2]])
x2 = v(x2, requires_grad=True)
x0 = t.mm(x0, x2)
y = v(t.FloatTensor([[1, 2], [3, 4]]))
z = t.mm(x0, y)
is gradient of x2 accumulated by every step? why gradient of x0 is None? and how can we get gradient of x0 for every step? Thanks a lot!
because you redefined x0. the second x0 is not really the x0 you defined the first time. it is not really reusing thje same variable, but redefining a symbol to be a different variable. your new variable here is an intermediate results, which do not retain gradients by default. you can call retain_grad() on it to get grad http://pytorch.org/docs/master/autograd.html#torch.autograd.Variable.retain_grad.
Thanks for your reply! Now I know how to obtain gradient for every step.
But I’m still wondering, after I changed x0 for every step, when I do backward and apply gradient to variables by step(), still the original x0 (first defined x0) will be updated but just I can’t print out its gradient, right?
That’s right. Just think of them as regular Python objects. The symbol name is just a pointer.
You solve my confusion, thanks a lot!