Get the gradient w.r.t. a subset of a parameter vector

Is it possible to get the gradient w.r.t. a subset of coordinates by indexing the parameter vector ?

For example, I was hoping that the code below would give me the gradients w.r.t. the first 2 coordinates of “x” instead of the whole “x” vector, but it gives me:

*** RuntimeError: differentiated input is unreachable

which tells me (correct me if I am wrong) that the indexing is returning a copy of the variable instead of a reference.

import torch
from torch.autograd import Variable

x = Variable(torch.randn(1,5), requires_grad = True)
y = Variable(torch.zeros(1))

loss = 0.5*((x - y)**2).sum()

# Runtime Error occurs here
grad = torch.autograd.grad(loss, x[0,:2], create_graph=True)[0]

if you are doing this for efficiency reasons, then dont. the way the autograd works, it will end up computing the full gradient for x anyways. Instead, get gradient wrt x, and narrow down to the first two elements.

Coming to the specific error, I’ll get back to you on this tomorrow.

That makes sense! Thanks for your reply.

So, is there currently a way to efficiently compute the gradients for some coordinates only ? I was thinking of using Pytorch for coordinate descent algorithms.

On another note, when you said that the full gradient is computed anyways, does that mean that when I apply operations on a Variable (with requires_grad=True) the gradient gets computed as well ? doesn’t it get computed when I call .backwards() ? or when I run torch.autograd.grad on the full input Variable?

Cheers!

On another note, when you said that the full gradient is computed anyways, does that mean that when I apply operations on a Variable (with requires_grad=True) the gradient gets computed as well ?

I meant it’s computed during backward() and then the co-ordinates you request are narrowed down. (rather than computing only the co-ordinates you desire).

1 Like

about your original error message.
it is expected.

x[0, :2] is unreachable from the original graph. every time you do indexing on x you get a new Variable
with history independent of all other indexing ops.

One way to keep your use-case efficient, and make this work is:

first index x, and then compute the loss only on those two elements.

For example:

x_ = x[0, :2]
loss = 0.5*((x_ - y) **2).sum()
grad = torch.autograd.grad(loss, x_, create_graph=True)[0]
2 Likes

I have a similar question. I have two sets of parameters, and I would like to do something like co-ordinate descent.

x = Variable(torch.randn(10, 20), requires_grad=False)  # data
u = Variable(torch.randn(10, 5), requires_grad=True)  # param set 1
v = Variable(torch.randn(5, 20), requires_grad=True)  # param set 2

opt_u = optim.Adagrad([u])
opt_v = optim.Adagrad([v])

for i in range(10):

    opt_u.zero_grad()
    loss = -torch.sum(x * (u @ v))
    loss.backward()  # gives me grads w.r.t  u and v. I need only grads w.r.t  'u'
    opt_u.step()  # update only param set 1 (u), keeping set 2 fixed.

    opt_v.zero_grad()
    loss = -torch.sum(x * (u @ v))
    loss.backward()  # gives me grads w.r.t  u and v. I need only grads w.r.t 'v'
    opt_v.step()  # update only param set 2 (v), keeping set 1 fixed.

I would like pytorch to avoid computing gradients w.r.t some leafs.
Is it possible? If yes, will it provide any speed improvements?