requires_grad=True for two variables

Abueidda · April 23, 2020, 7:39pm

Hi all,

I have a model that has multiple inputs, and I was wondering if it is possible to find the gradients of the output with respect to the inputs.

The layout is as follows:

output = f(inp1, inp2, …)

To do so, we minimize the loss: Solve MSE optimization problem loss ||output-target||

Then after the model is trained I am interested in finding the sensitivities:
d(output)/d(inp1), d(output)/d(inp2), …

I believe it is doable, but I am not sure how. Any help/hint is appreciated.

albanD · April 23, 2020, 7:47pm

Hi,

First, make sure your inputs require gradients:

If they don’t, just call requires_grad_() on them before giving them to your net
If they do and are leaf (inp1.is_leaf()) then you’re good to go
If they do but are not leafs, you can do inp1.retain_grad() to make sure the .grad field will be populated properly.

Then if output is a scalar, you can simply .backward() on the output.
Otherwise, if you want the full jacobian, you can use the torch.autograd.functional package from pytorch 1.5.
If you want the sum of the gradients for each element in the output, you can do output.sum().backward().

Abueidda · April 23, 2020, 10:03pm

Hi albanD,

Many thanks for the response. May you elaborate on what you mean by leaf? I will try to do it along the lines you suggested and come back to you, Thanks again

albanD · April 23, 2020, 10:30pm

A leaf Tensor is a Tensor that does not have history. You can check it with your_tensor.is_leaf().
When calling .backward(), it will populate the .grad field of all the leaf Tensors that require gradients that were use in the computation of the output.

Abueidda · April 23, 2020, 10:51pm

Thanks again for the explanation

I have the following after output.backward() statement

inpt2_grad = grad(output, inp2, torch.ones(inp2.size()[0], 1, device=dev), create_graph=True, retain_graph=True)

inpt2_grad = inpt2_grad.cpu().numpy()

I get the following error at the line where I have: inpt2_grad = grad(output, inp2, torch…
Mismatch in shape: grad_output[0] has a shape of torch.Size([15625, 1]) and output[0] has a shape of torch.Size([]).

Abueidda · April 23, 2020, 11:21pm

Also,

If I do the following
inp2_grad = inp2.grad
print(inp2_grad)
inp2_grad = inp2_grad.detach().cpu().numpy()

I get None when I print inp2_grad
Also, I cannot make it a numpy array

albanD · April 24, 2020, 4:52pm

Hi,

The issue here is that the grad_output should be of the same size as the output. In this case, the output is a scalar so you can actually leave the grad_output field empty to get a Tensor containing a single 1.

Autograd.grad does not populate the .grad field of the Tensor, it returns the gradients directly and you should get it the same way you do in your post above. Note that it returns a tuple though so you might need to do inpt2_grad, = xxx.

Abueidda · April 24, 2020, 8:09pm

Thanks, albanD.

I did the following:
inpt2_grad = grad(output, inpt2, create_graph=True, retain_graph=True)

But now I am encountering a new issue:
One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Any insights of what allow_unused does? I tried to read the documentation, but I got lost.

Also, if I add the allow_unused=True statement, grad(output,…) returns None.

albanD · April 24, 2020, 8:44pm

This error means that pytorch cannot find any link between output and inpt2. Meaning that output was not computed in a differentiable way based on inpt2.
So you want to double check your code to make sure that output does depend on input2 (or if it does not, you can remove this call and replace it by a Tensor full of 0s).

Abueidda · April 24, 2020, 9:05pm

Thanks. I see.

So let me layout the code, and you might be able to tell me if there is something wrong:

class Grad_finder:

def fun1(x1, x2, x3): 

    x1.requires_grad_(True)
    x1.retain_grad()
    x2.requires_grad_(True)
    x2.retain_grad()
    

    
    '''
    we minimize the loss
    '''
    for i in range(epochs):
        L1 = f(x1, x3) # f is a neural network model
        L2 = g(L1, x2) # g is a defined function
        L3 = h(x1, x3) # h is a defined function
    
        loss = L2 - L3
        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimzer.step()
    
    # now f is trained 
    L1 = f(x1, x3)
    L2 = g(L1, x2)
    
    return L2
    
def fun2(x1, x2, x3): 

    x1.requires_grad_(True)
    x1.retain_grad()
    x2.requires_grad_(True)
    x2.retain_grad()
    
    L2 = fun1(x1, x2, x3)
    
    '''
    Here i want to find :
    dL2/dx2
    '''
    
    return dL2dx2

dL2dx2 = Grad_finder.fun2(x1, x2, x3)

albanD · April 24, 2020, 9:58pm

Ok, so the only potentially differentiable link I can see between L2 and x2 is L2 = g(L1, x2).
So what is g here?

Abueidda · April 25, 2020, 11:11pm

I think this is what I am missing: the relationship between L2 and x2. They are implicitly related. I have to revisit the problem and see how I can proceed. Many thanks, albanD.