# How to compute the gradient of a component of a vector-valued function?

Let’s say I have a function Psi with a 4-dimensional vector output, that takes a 3-dimensional vector u as input. I would like to compute the gradient of the first three components of Psi w.r.t. the respective three components of u:

import torch

psi = torch.zeros(4)
psi = 2*u
psi = 2*u
psi = 2*u
psi = torch.dot(u,u)



And I get the error that u,u, and u are not used in the graph:

---> 19 grad_Psi_0 = torch.autograd.grad(psi, u)

273     return _vmap_internals._vmap(vjp, 0, 0, allow_none_pass_through=True)(grad_outputs)
274 else:
--> 275     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
276         outputs, grad_outputs_, retain_graph, create_graph, inputs,

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.


Why is this the case? I am using u[i] to build psi…

Your leaf Tensor here is u not u or u so that’s why you have the error you do.

The code you want is shown below

def func(u):

print(out)
#returns
tensor([[2., 0., 0.],
[0., 2., 0.],
[0., 0., 2.],
[2., 4., 6.]])

1 Like

I got a tip on Stack Overflow regarding this issue also, so this is what I made work using the answer from there:

import torch

# u = grad(Phi) = [2*u0, 2*u1, 2*u2]
# Phi = u0**2 + u1**2 + u2**2 = dot(u,u)

psi = torch.zeros(4)
psi = 2*u
psi = 2*u
psi = 2*u
psi = torch.dot(u,u)

print("u = ",u)
print("psi = ",psi)

# Divergence of the vector phi[0:3]=2u0 + 2u1 + 2u2 w.r.t [u0,u1,u2] = 2+2+2=6
print (div_v)

# laplace(psi) = \partial_u0^2 psi + \partial_u1^2 psi + \partial_u2^2 psi
# = \partial_u0 2x + \partial_u1 2u1 + \partial_u2 2u2 = 2 + 2 + 2 = 6
print(d_phi_du)

laplace_phi = torch.dot(dd_phi_d2u0 + dd_phi_d2u1 + dd_phi_d2u2, torch.ones(3))

print(laplace_phi)


The answer from @AlphaBetaGamma96 uses the Jacobian, and this variant above uses partial derivative w.r.t. the whole u vector - is there not an option to derive w.r.t. u-component? I mean, with a Jacobian, gradients are computed that are not needed, same as with grad(psi[i], u)

When you calculate gradients via torch.autograd.grad what you’re doing is taking an input vector u (not u or u as these are just views on u), you take u and pass it through a model to calculate psi. PyTorch will see entirely u and how it maps to psi. When you call torch.autograd.grad(psi, u) you are telling pytorch to calculate the gradient (via reverse-mode AD) what is the gradient of psi with respect to u. And, as u and psi were used in your model it can quite happily calculate that gradient.

however when you pass something like torch.autograd.gradd(psi, u) you are saying to PyTorch’s autograd find the gradient of psi and u, but the variable u was never used in the computation of psi the whole vector u was used. When you pass u to torch.autograd.grad you are sending a view on u which wasn’t used in the gradient computation.

That’s why autograd gives you an error messages saying,

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.


because your model used u to compute psi, and not the view of u to compute it. That’s why you get the error you do. Does that make sense?

1 Like

If I understand you correctly, this would mean the operations in constructing psi

psi = torch.zeros(4)
psi = 2*u
psi = 2*u
psi = 2*u
psi = torch.dot(u,u)


that use components of u, do not propagate the information to the graph, that those same u-components take part in building psi-components? If this is true, I find this kinda surprising. If u is a vector and I build new tensors using it’s components explicitly like this, and taking a component (slicing a tensor) is an operation that can be differentiated (?), I would expect the dependency information of psi[i] → depends → u[i] would be stored in the graph.

Then computing grad(psi, u[i]) would make sense, because mathematically, I am taking a Jacobian of a vector-valued function w.r.t it’s one leaf u[i] input variable… on paper it makes sense, at least to me.

But the thing is, you’re not building a new tensor via u you’re taking a view on u for its 0-th element. That’s why autograd can’t see it how individual components of u act within the computation graph it only sees u not u nor u. That’s why you get that error about unused Tensors.

u = torch.randn(3, requires_grad=True)

1 Like

Think of this computation via this (pretty poor ASCII diagram)

u -> model -> psi


When you do torch.autograd.grad(psi, u)

u -> model -> psi

|
v

u


As you can see, psi and u have no connection and hence no gradient

1 Like But the thing is, you’re not building a new tensor via u you’re taking a view on u for its 0-th element. That’s why autograd can’t see it how individual components of u act within the computation graph it only sees u not u nor u . That’s why you get that error about unused Tensors.

Yeah, thanks a lot! I think I got that, I’m just saying, as a person just starting to use torch.autograd, with another background (where there are no views), just after reading about the DAG/AD, etc, I wasn’t expecting this to be the case. Maybe it would be good to add this small example to the official documentation and warn newcomers like myself about a not reaching into u, but generating a new view, that’s then not accounted for in the DAG.

If I look at how my psi is defined:

psi = torch.zeros(4)
psi = 2*u
psi = 2*u
psi = 2*u
psi = torch.dot(u,u)


mathematically, its components are constructed as functions of the components of the vector u, so to my beginner’s mind,

u, u, u -> psi

or

psi = psi(u,u,u)

then I should be able to do d psi / du in this graph. And what you’re saying (if I understood you), when I call u, I create a separate view of the component of u, disconnected from the way psi is computed? Maybe I am not understanding it after all…

Exactly, when you call torch.autograd.grad(psi, u) you are telling autograd to find the gradient of psi with respect to u. However, psi never used u to be calculated only u was used, and hence there’s no gradient.