How to compute the gradient of a component of a vector-valued function?

tmaric · July 27, 2022, 5:59am

Let’s say I have a function Psi with a 4-dimensional vector output, that takes a 3-dimensional vector u as input. I would like to compute the gradient of the first three components of Psi w.r.t. the respective three components of u:

import torch

u = torch.tensor([1.,2.,3.], requires_grad=True)

psi = torch.zeros(4)
psi[0] = 2*u[0]
psi[1] = 2*u[1]
psi[2] = 2*u[2]
psi[3] = torch.dot(u,u)

grad_Psi_0 = torch.autograd.grad(psi[0], u[0])
grad_Psi_1 = torch.autograd.grad(psi[1], u[1])
grad_Psi_2 = torch.autograd.grad(psi[2], u[2])

And I get the error that u[0],u[1], and u[2] are not used in the graph:

---> 19 grad_Psi_0 = torch.autograd.grad(psi[0], u[0])
     20 grad_Psi_1 = torch.autograd.grad(psi[1], u[1])
     21 grad_Psi_2 = torch.autograd.grad(psi[2], u[2])

File ~/.local/lib/python3.10/site-packages/torch/autograd/__init__.py:275, in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused, is_grads_batched)
    273     return _vmap_internals._vmap(vjp, 0, 0, allow_none_pass_through=True)(grad_outputs)
    274 else:
--> 275     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    276         outputs, grad_outputs_, retain_graph, create_graph, inputs,
    277         allow_unused, accumulate_grad=False)

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Why is this the case? I am using u[i] to build psi…

AlphaBetaGamma96 · July 27, 2022, 10:49am

Your leaf Tensor here is u not u[0] or u[1] so that’s why you have the error you do.

The code you want is shown below

def func(u):
  return torch.cat([2*u, torch.dot(u,u).unsqueeze(0)], dim=0) #vectorize form of your function

out = torch.autograd.functional.jacobian(func, u)
print(out)
#returns 
tensor([[2., 0., 0.],
        [0., 2., 0.],
        [0., 0., 2.],
        [2., 4., 6.]])

tmaric · July 27, 2022, 12:27pm

I got a tip on Stack Overflow regarding this issue also, so this is what I made work using the answer from there:

import torch

# u = grad(Phi) = [2*u0, 2*u1, 2*u2]
# Phi = u0**2 + u1**2 + u2**2 = dot(u,u)

u = torch.tensor([1.,2.,3.], requires_grad=True)

psi = torch.zeros(4)
psi[0] = 2*u[0]
psi[1] = 2*u[1]
psi[2] = 2*u[2]
psi[3] = torch.dot(u,u)

print("u = ",u)
print("psi = ",psi)

grad_v_x = torch.autograd.grad(psi[0], u, retain_graph=True)[0]
print(grad_v_x)
grad_v_y = torch.autograd.grad(psi[1], u, retain_graph=True)[0]
print(grad_v_y)
grad_v_z = torch.autograd.grad(psi[2], u, retain_graph=True)[0]
print(grad_v_z)
div_v = grad_v_x[0] + grad_v_y[1] + grad_v_z[2]

# Divergence of the vector phi[0:3]=2u0 + 2u1 + 2u2 w.r.t [u0,u1,u2] = 2+2+2=6
print (div_v)

# laplace(psi[3]) = \partial_u0^2 psi[3] + \partial_u1^2 psi[3] + \partial_u2^2 psi[3]
# = \partial_u0 2x + \partial_u1 2u1 + \partial_u2 2u2 = 2 + 2 + 2 = 6
d_phi_du = torch.autograd.grad(psi[3], u, create_graph=True, retain_graph=True)[0]
print(d_phi_du)
dd_phi_d2u0 = torch.autograd.grad(d_phi_du[0], u, retain_graph=True)[0]
dd_phi_d2u1 = torch.autograd.grad(d_phi_du[1], u, retain_graph=True)[0]
dd_phi_d2u2 = torch.autograd.grad(d_phi_du[2], u, retain_graph=True)[0]

laplace_phi = torch.dot(dd_phi_d2u0 + dd_phi_d2u1 + dd_phi_d2u2, torch.ones(3))

print(laplace_phi)

The answer from @AlphaBetaGamma96 uses the Jacobian, and this variant above uses partial derivative w.r.t. the whole u vector - is there not an option to derive w.r.t. u-component? I mean, with a Jacobian, gradients are computed that are not needed, same as with grad(psi[i], u)

AlphaBetaGamma96 · July 27, 2022, 12:44pm

When you calculate gradients via torch.autograd.grad what you’re doing is taking an input vector u (not u[0] or u[1] as these are just views on u), you take u and pass it through a model to calculate psi. PyTorch will see entirely u and how it maps to psi. When you call torch.autograd.grad(psi, u) you are telling pytorch to calculate the gradient (via reverse-mode AD) what is the gradient of psi with respect to u. And, as u and psi were used in your model it can quite happily calculate that gradient.

however when you pass something like torch.autograd.gradd(psi, u[0]) you are saying to PyTorch’s autograd find the gradient of psi and u[0], but the variable u[0] was never used in the computation of psi the whole vector u was used. When you pass u[0] to torch.autograd.grad you are sending a view on u which wasn’t used in the gradient computation.

That’s why autograd gives you an error messages saying,

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

because your model used u to compute psi, and not the view of u[0] to compute it. That’s why you get the error you do. Does that make sense?

tmaric · July 27, 2022, 12:56pm

If I understand you correctly, this would mean the operations in constructing psi

psi = torch.zeros(4)
psi[0] = 2*u[0]
psi[1] = 2*u[1]
psi[2] = 2*u[2]
psi[3] = torch.dot(u,u)

that use components of u, do not propagate the information to the graph, that those same u-components take part in building psi-components? If this is true, I find this kinda surprising. If u is a vector and I build new tensors using it’s components explicitly like this, and taking a component (slicing a tensor) is an operation that can be differentiated (?), I would expect the dependency information of psi[i] → depends → u[i] would be stored in the graph.

Then computing grad(psi, u[i]) would make sense, because mathematically, I am taking a Jacobian of a vector-valued function w.r.t it’s one leaf u[i] input variable… on paper it makes sense, at least to me.

AlphaBetaGamma96 · July 27, 2022, 1:04pm

But the thing is, you’re not building a new tensor via u[0] you’re taking a view on u for its 0-th element. That’s why autograd can’t see it how individual components of u act within the computation graph it only sees u not u[0] nor u[1]. That’s why you get that error about unused Tensors.

u = torch.randn(3, requires_grad=True)
print(u[0]) #returns tensor(-0.0611, grad_fn=<SelectBackward0>)

AlphaBetaGamma96 · July 27, 2022, 1:06pm

Think of this computation via this (pretty poor ASCII diagram)

u -> model -> psi

When you do torch.autograd.grad(psi, u[0])

u -> model -> psi

|
v

u[0]

As you can see, psi and u[0] have no connection and hence no gradient

tmaric · July 27, 2022, 1:07pm

But the thing is, you’re not building a new tensor via u[0] you’re taking a view on u for its 0-th element. That’s why autograd can’t see it how individual components of u act within the computation graph it only sees u not u[0] nor u[1] . That’s why you get that error about unused Tensors.

Yeah, thanks a lot! I think I got that, I’m just saying, as a person just starting to use torch.autograd, with another background (where there are no views), just after reading about the DAG/AD, etc, I wasn’t expecting this to be the case. Maybe it would be good to add this small example to the official documentation and warn newcomers like myself about a[0] not reaching into u, but generating a new view, that’s then not accounted for in the DAG.

If I look at how my psi is defined:

psi = torch.zeros(4)
psi[0] = 2*u[0]
psi[1] = 2*u[1]
psi[2] = 2*u[2]
psi[3] = torch.dot(u,u)

mathematically, its components are constructed as functions of the components of the vector u, so to my beginner’s mind,

u[0], u[1], u[2] -> psi

or

psi = psi(u[0],u[1],u[2])

then I should be able to do d psi / du[2] in this graph. And what you’re saying (if I understood you), when I call u[0], I create a separate view of the component of u, disconnected from the way psi is computed? Maybe I am not understanding it after all…

AlphaBetaGamma96 · July 27, 2022, 4:32pm

Exactly, when you call torch.autograd.grad(psi, u[0]) you are telling autograd to find the gradient of psi with respect to u[0]. However, psi never used u[0] to be calculated only u was used, and hence there’s no gradient.