Is it possible to get the gradient w.r.t. a subset of coordinates by indexing the parameter vector ?
For example, I was hoping that the code below would give me the gradients w.r.t. the first 2 coordinates of “x” instead of the whole “x” vector, but it gives me:
*** RuntimeError: differentiated input is unreachable
which tells me (correct me if I am wrong) that the indexing is returning a copy of the variable instead of a reference.
import torch
from torch.autograd import Variable
x = Variable(torch.randn(1,5), requires_grad = True)
y = Variable(torch.zeros(1))
loss = 0.5*((x - y)**2).sum()
# Runtime Error occurs here
grad = torch.autograd.grad(loss, x[0,:2], create_graph=True)[0]
if you are doing this for efficiency reasons, then dont. the way the autograd works, it will end up computing the full gradient for x anyways. Instead, get gradient wrt x, and narrow down to the first two elements.
Coming to the specific error, I’ll get back to you on this tomorrow.
So, is there currently a way to efficiently compute the gradients for some coordinates only ? I was thinking of using Pytorch for coordinate descent algorithms.
On another note, when you said that the full gradient is computed anyways, does that mean that when I apply operations on a Variable (with requires_grad=True) the gradient gets computed as well ? doesn’t it get computed when I call .backwards() ? or when I run torch.autograd.grad on the full input Variable?
On another note, when you said that the full gradient is computed anyways, does that mean that when I apply operations on a Variable (with requires_grad=True) the gradient gets computed as well ?
I meant it’s computed during backward() and then the co-ordinates you request are narrowed down. (rather than computing only the co-ordinates you desire).
about your original error message.
it is expected.
x[0, :2] is unreachable from the original graph. every time you do indexing on x you get a new Variable
with history independent of all other indexing ops.
One way to keep your use-case efficient, and make this work is:
first index x, and then compute the loss only on those two elements.
For example:
x_ = x[0, :2]
loss = 0.5*((x_ - y) **2).sum()
grad = torch.autograd.grad(loss, x_, create_graph=True)[0]
I have a similar question. I have two sets of parameters, and I would like to do something like co-ordinate descent.
x = Variable(torch.randn(10, 20), requires_grad=False) # data
u = Variable(torch.randn(10, 5), requires_grad=True) # param set 1
v = Variable(torch.randn(5, 20), requires_grad=True) # param set 2
opt_u = optim.Adagrad([u])
opt_v = optim.Adagrad([v])
for i in range(10):
opt_u.zero_grad()
loss = -torch.sum(x * (u @ v))
loss.backward() # gives me grads w.r.t u and v. I need only grads w.r.t 'u'
opt_u.step() # update only param set 1 (u), keeping set 2 fixed.
opt_v.zero_grad()
loss = -torch.sum(x * (u @ v))
loss.backward() # gives me grads w.r.t u and v. I need only grads w.r.t 'v'
opt_v.step() # update only param set 2 (v), keeping set 1 fixed.
I would like pytorch to avoid computing gradients w.r.t some leafs.
Is it possible? If yes, will it provide any speed improvements?