requires_grad=True for two variables

Hi all,

I have a model that has multiple inputs, and I was wondering if it is possible to find the gradients of the output with respect to the inputs.

The layout is as follows:

output = f(inp1, inp2, …)

To do so, we minimize the loss: Solve MSE optimization problem loss ||output-target||

Then after the model is trained I am interested in finding the sensitivities:
d(output)/d(inp1), d(output)/d(inp2), …

I believe it is doable, but I am not sure how. Any help/hint is appreciated.


First, make sure your inputs require gradients:

  • If they don’t, just call requires_grad_() on them before giving them to your net
  • If they do and are leaf (inp1.is_leaf()) then you’re good to go
  • If they do but are not leafs, you can do inp1.retain_grad() to make sure the .grad field will be populated properly.

Then if output is a scalar, you can simply .backward() on the output.
Otherwise, if you want the full jacobian, you can use the torch.autograd.functional package from pytorch 1.5.
If you want the sum of the gradients for each element in the output, you can do output.sum().backward().


Hi albanD,

Many thanks for the response. May you elaborate on what you mean by leaf? I will try to do it along the lines you suggested and come back to you, Thanks again

A leaf Tensor is a Tensor that does not have history. You can check it with your_tensor.is_leaf().
When calling .backward(), it will populate the .grad field of all the leaf Tensors that require gradients that were use in the computation of the output.

Thanks again for the explanation

I have the following after output.backward() statement

inpt2_grad = grad(output, inp2, torch.ones(inp2.size()[0], 1, device=dev), create_graph=True, retain_graph=True)

inpt2_grad = inpt2_grad.cpu().numpy()

I get the following error at the line where I have: inpt2_grad = grad(output, inp2, torch…
Mismatch in shape: grad_output[0] has a shape of torch.Size([15625, 1]) and output[0] has a shape of torch.Size([]).


If I do the following
inp2_grad = inp2.grad
inp2_grad = inp2_grad.detach().cpu().numpy()

I get None when I print inp2_grad
Also, I cannot make it a numpy array


The issue here is that the grad_output should be of the same size as the output. In this case, the output is a scalar so you can actually leave the grad_output field empty to get a Tensor containing a single 1.

Autograd.grad does not populate the .grad field of the Tensor, it returns the gradients directly and you should get it the same way you do in your post above. Note that it returns a tuple though so you might need to do inpt2_grad, = xxx.

Thanks, albanD.

I did the following:
inpt2_grad = grad(output, inpt2, create_graph=True, retain_graph=True)

But now I am encountering a new issue:
One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Any insights of what allow_unused does? I tried to read the documentation, but I got lost.

Also, if I add the allow_unused=True statement, grad(output,…) returns None.

This error means that pytorch cannot find any link between output and inpt2. Meaning that output was not computed in a differentiable way based on inpt2.
So you want to double check your code to make sure that output does depend on input2 (or if it does not, you can remove this call and replace it by a Tensor full of 0s).

Thanks. I see.

So let me layout the code, and you might be able to tell me if there is something wrong:

class Grad_finder:

def fun1(x1, x2, x3): 


    we minimize the loss
    for i in range(epochs):
        L1 = f(x1, x3) # f is a neural network model
        L2 = g(L1, x2) # g is a defined function
        L3 = h(x1, x3) # h is a defined function
        loss = L2 - L3
    # now f is trained 
    L1 = f(x1, x3)
    L2 = g(L1, x2)
    return L2
def fun2(x1, x2, x3): 

    L2 = fun1(x1, x2, x3)
    Here i want to find :
    return dL2dx2

dL2dx2 = Grad_finder.fun2(x1, x2, x3)

Ok, so the only potentially differentiable link I can see between L2 and x2 is L2 = g(L1, x2).
So what is g here?

1 Like

I think this is what I am missing: the relationship between L2 and x2. They are implicitly related. I have to revisit the problem and see how I can proceed. Many thanks, albanD.