# How to compute the gradient of a component of a vector-valued function?

Let’s say I have a function Psi with a 4-dimensional vector output, that takes a 3-dimensional vector u as input. I would like to compute the gradient of the first three components of Psi w.r.t. the respective three components of u:

import torch

psi = torch.zeros(4)
psi[0] = 2*u[0]
psi[1] = 2*u[1]
psi[2] = 2*u[2]
psi[3] = torch.dot(u,u)



And I get the error that u[0],u[1], and u[2] are not used in the graph:

---> 19 grad_Psi_0 = torch.autograd.grad(psi[0], u[0])

273     return _vmap_internals._vmap(vjp, 0, 0, allow_none_pass_through=True)(grad_outputs)
274 else:
--> 275     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
276         outputs, grad_outputs_, retain_graph, create_graph, inputs,

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.


Why is this the case? I am using u[i] to build psi…

Your leaf Tensor here is u not u[0] or u[1] so that’s why you have the error you do.

The code you want is shown below

def func(u):

print(out)
#returns
tensor([[2., 0., 0.],
[0., 2., 0.],
[0., 0., 2.],
[2., 4., 6.]])

1 Like

I got a tip on Stack Overflow regarding this issue also, so this is what I made work using the answer from there:

import torch

# u = grad(Phi) = [2*u0, 2*u1, 2*u2]
# Phi = u0**2 + u1**2 + u2**2 = dot(u,u)

psi = torch.zeros(4)
psi[0] = 2*u[0]
psi[1] = 2*u[1]
psi[2] = 2*u[2]
psi[3] = torch.dot(u,u)

print("u = ",u)
print("psi = ",psi)

# Divergence of the vector phi[0:3]=2u0 + 2u1 + 2u2 w.r.t [u0,u1,u2] = 2+2+2=6
print (div_v)

# laplace(psi[3]) = \partial_u0^2 psi[3] + \partial_u1^2 psi[3] + \partial_u2^2 psi[3]
# = \partial_u0 2x + \partial_u1 2u1 + \partial_u2 2u2 = 2 + 2 + 2 = 6
print(d_phi_du)

laplace_phi = torch.dot(dd_phi_d2u0 + dd_phi_d2u1 + dd_phi_d2u2, torch.ones(3))

print(laplace_phi)


The answer from @AlphaBetaGamma96 uses the Jacobian, and this variant above uses partial derivative w.r.t. the whole u vector - is there not an option to derive w.r.t. u-component? I mean, with a Jacobian, gradients are computed that are not needed, same as with grad(psi[i], u)

When you calculate gradients via torch.autograd.grad what you’re doing is taking an input vector u (not u[0] or u[1] as these are just views on u), you take u and pass it through a model to calculate psi. PyTorch will see entirely u and how it maps to psi. When you call torch.autograd.grad(psi, u) you are telling pytorch to calculate the gradient (via reverse-mode AD) what is the gradient of psi with respect to u. And, as u and psi were used in your model it can quite happily calculate that gradient.

however when you pass something like torch.autograd.gradd(psi, u[0]) you are saying to PyTorch’s autograd find the gradient of psi and u[0], but the variable u[0] was never used in the computation of psi the whole vector u was used. When you pass u[0] to torch.autograd.grad you are sending a view on u which wasn’t used in the gradient computation.

That’s why autograd gives you an error messages saying,

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.


because your model used u to compute psi, and not the view of u[0] to compute it. That’s why you get the error you do. Does that make sense?

1 Like

If I understand you correctly, this would mean the operations in constructing psi

psi = torch.zeros(4)
psi[0] = 2*u[0]
psi[1] = 2*u[1]
psi[2] = 2*u[2]
psi[3] = torch.dot(u,u)


that use components of u, do not propagate the information to the graph, that those same u-components take part in building psi-components? If this is true, I find this kinda surprising. If u is a vector and I build new tensors using it’s components explicitly like this, and taking a component (slicing a tensor) is an operation that can be differentiated (?), I would expect the dependency information of psi[i] → depends → u[i] would be stored in the graph.

Then computing grad(psi, u[i]) would make sense, because mathematically, I am taking a Jacobian of a vector-valued function w.r.t it’s one leaf u[i] input variable… on paper it makes sense, at least to me.

But the thing is, you’re not building a new tensor via u[0] you’re taking a view on u for its 0-th element. That’s why autograd can’t see it how individual components of u act within the computation graph it only sees u not u[0] nor u[1]. That’s why you get that error about unused Tensors.

u = torch.randn(3, requires_grad=True)

1 Like

Think of this computation via this (pretty poor ASCII diagram)

u -> model -> psi


When you do torch.autograd.grad(psi, u[0])

u -> model -> psi

|
v

u[0]


As you can see, psi and u[0] have no connection and hence no gradient

1 Like

But the thing is, you’re not building a new tensor via u[0] you’re taking a view on u for its 0-th element. That’s why autograd can’t see it how individual components of u act within the computation graph it only sees u not u[0] nor u[1] . That’s why you get that error about unused Tensors.

Yeah, thanks a lot! I think I got that, I’m just saying, as a person just starting to use torch.autograd, with another background (where there are no views), just after reading about the DAG/AD, etc, I wasn’t expecting this to be the case. Maybe it would be good to add this small example to the official documentation and warn newcomers like myself about a[0] not reaching into u, but generating a new view, that’s then not accounted for in the DAG.

If I look at how my psi is defined:

psi = torch.zeros(4)
psi[0] = 2*u[0]
psi[1] = 2*u[1]
psi[2] = 2*u[2]
psi[3] = torch.dot(u,u)


mathematically, its components are constructed as functions of the components of the vector u, so to my beginner’s mind,

u[0], u[1], u[2] -> psi

or

psi = psi(u[0],u[1],u[2])

then I should be able to do d psi / du[2] in this graph. And what you’re saying (if I understood you), when I call u[0], I create a separate view of the component of u, disconnected from the way psi is computed? Maybe I am not understanding it after all…

Exactly, when you call torch.autograd.grad(psi, u[0]) you are telling autograd to find the gradient of psi with respect to u[0]. However, psi never used u[0] to be calculated only u was used, and hence there’s no gradient.