Is it possible to get the gradient w.r.t. a subset of coordinates by indexing the parameter vector ?

For example, I was hoping that the code below would give me the gradients w.r.t. the first 2 coordinates of “x” instead of the whole “x” vector, but it gives me:

*** RuntimeError: differentiated input is unreachable

which tells me (correct me if I am wrong) that the indexing is returning a copy of the variable instead of a reference.

import torch
from torch.autograd import Variable
x = Variable(torch.randn(1,5), requires_grad = True)
y = Variable(torch.zeros(1))
loss = 0.5*((x - y)**2).sum()
# Runtime Error occurs here
grad = torch.autograd.grad(loss, x[0,:2], create_graph=True)[0]

if you are doing this for efficiency reasons, then dont. the way the autograd works, it will end up computing the full gradient for x anyways. Instead, get gradient wrt x, and narrow down to the first two elements.

Coming to the specific error, I’ll get back to you on this tomorrow.

So, is there currently a way to efficiently compute the gradients for some coordinates only ? I was thinking of using Pytorch for coordinate descent algorithms.

On another note, when you said that the full gradient is computed anyways, does that mean that when I apply operations on a Variable (with requires_grad=True) the gradient gets computed as well ? doesn’t it get computed when I call .backwards() ? or when I run torch.autograd.grad on the full input Variable?

On another note, when you said that the full gradient is computed anyways, does that mean that when I apply operations on a Variable (with requires_grad=True) the gradient gets computed as well ?

I meant it’s computed during backward() and then the co-ordinates you request are narrowed down. (rather than computing only the co-ordinates you desire).

about your original error message.
it is expected.

x[0, :2] is unreachable from the original graph. every time you do indexing on x you get a new Variable
with history independent of all other indexing ops.

One way to keep your use-case efficient, and make this work is:

first index x, and then compute the loss only on those two elements.

For example:

x_ = x[0, :2]
loss = 0.5*((x_ - y) **2).sum()
grad = torch.autograd.grad(loss, x_, create_graph=True)[0]

I have a similar question. I have two sets of parameters, and I would like to do something like co-ordinate descent.

x = Variable(torch.randn(10, 20), requires_grad=False) # data
u = Variable(torch.randn(10, 5), requires_grad=True) # param set 1
v = Variable(torch.randn(5, 20), requires_grad=True) # param set 2
opt_u = optim.Adagrad([u])
opt_v = optim.Adagrad([v])
for i in range(10):
opt_u.zero_grad()
loss = -torch.sum(x * (u @ v))
loss.backward() # gives me grads w.r.t u and v. I need only grads w.r.t 'u'
opt_u.step() # update only param set 1 (u), keeping set 2 fixed.
opt_v.zero_grad()
loss = -torch.sum(x * (u @ v))
loss.backward() # gives me grads w.r.t u and v. I need only grads w.r.t 'v'
opt_v.step() # update only param set 2 (v), keeping set 1 fixed.

I would like pytorch to avoid computing gradients w.r.t some leafs.
Is it possible? If yes, will it provide any speed improvements?