Get the Jacobian using autograd

smu226 · June 10, 2019, 7:40pm

Hello! I want to get the Jacobian matrix using Pytorch automatic differentiation. I have this code so far:

x = np.arange(1,3,1)
x = torch.from_numpy(x).reshape(len(x),1)
x = x.float()
x.requires_grad = True

w1 = torch.randn((2,2), requires_grad = True)
y = w1@x

jac = torch.autograd.grad(y, x, grad_outputs=y.data.new(y.shape).fill_(1), create_graph=True)

So I want jac to recover w1. But in this form, jac has the same dimensions as y and I am not sure what to do. Can someone tell me how to fix that function such that I get what I need? Thank you!

jerinphilip · June 10, 2019, 8:35pm

I’m not particularly sure of the math here. But shouldn’t the following get you the required matrix? You’re trying to compute gradient(y, x) as per docs. Doing y.backward() will store gradient(w, x) as required by gradient descent. The problem if you try that directly is that it throws an error that it can’t be computed implicitly. As a hack, you can do y[i].backward(retain_graph=True) and accumulate the required gradient in w1.grad over each function.

>>> x = np.arange(1,3,1)
>>> x = torch.from_numpy(x).reshape(len(x),1)
>>> x = x.float()
>>> x.requires_grad = True
>>> 
>>> w1 = torch.randn((2,2), requires_grad = True)
>>> y = w1@x
>>> 
>>> # Compute gradients of w1 . x
... 
>>> y[0].backward(retain_graph=True)
>>> w1.grad
tensor([[1., 2.],
        [0., 0.]])
>>> y[1].backward(retain_graph=True)
>>> w1.grad
tensor([[1., 2.],
        [1., 2.]])

smu226 · June 10, 2019, 8:42pm

This is not really what I need to do. I want the returned variable to be equal to the original w1 (this is by definition the Jacobian). For example the w1 is this:

tensor([[ 0.9947, -0.6104],
        [ 0.4867, -1.2023]], requires_grad=True)

If I call the function

l1 = torch.autograd.grad(y, x, grad_outputs=y.data.new(y.shape).fill_(1), create_graph=True)

I get this:

(tensor([[ 1.4815],
         [-1.8127]], grad_fn=<MmBackward>),)

This output gives, for some reason, the sum of the columns of the original matrix. Instead I want the actual original matrix i.e. the 2 by 2:

tensor([[ 0.9947, -0.6104],
        [ 0.4867, -1.2023]], requires_grad=True)

AndreasBuhr · June 10, 2019, 9:43pm

Hi Silviu,

torch.autograd.grad

computes the product of the jacobian with the vector given in ‘grad_outputs’. In ‘grad_outputs’ you give the vector [1., 1.].

grad_outputs=y.data.new(y.shape).fill_(1)

So what you see is the expected behaviour. To compute the jacobian, you have to multiply with [1., 0] to extract the first column, then with [0., 1.] to extract the second column.

Here is the complete code:

x = np.arange(1,3,1)                                                                                                                   
x = torch.from_numpy(x).reshape(len(x),1)                                                                                              
x = x.float()                                                                                                                          
x.requires_grad = True                                                                                                                 
                                                                                                                                       
w1 = torch.randn((2,2), requires_grad = False)                                                                                         
y = w1@x                                                                                                                               
                                                                                                                                       
print(w1)                                                                                                                              
jacT = torch.zeros(2,2)                                                                                                                
for i in range(2):                                                                                                                     
    output = torch.zeros(2,1)                                                                                                          
    output[i] = 1.                                                                                                                     
    jacT[:,i:i+1] = torch.autograd.grad(y, x, grad_outputs=output, retain_graph=True)[0]

Does this help?

smu226 · June 10, 2019, 11:39pm

Thanks a lot! That’s what I need. Is there however a way to do this without the for loop? For my problem I need to add the Jacobian to the loss function, so I need to compute it after each iteration. Also the input and output are not just 2 dimensional, so that for loop would slow down the whole code quite a bit?

AndreasBuhr · June 11, 2019, 11:53am

Hi Silviu,

maybe you can formulate your problem in another way, so that the Jacobian is not needed? Maybe you can elaborate a little bit more? What are you trying to achieve and in which way do you want to include the Jacobian into the loss function?

smu226 · June 13, 2019, 3:13am

Hello! I want the Jacobian between the output layer and some hidden layer to be as close as possible to identity. So mathematically, I need to compute the derivative of each output node with respect to each node in this hidden layer, build a matrix out of these derivatives and minimize abs(Jacobian - Identity). It is for a physics related project and I actually need the Jacobian. The physics formulation is quite easy, I just don’t know how to implement it without a for look. Thank you!

amitoz · September 19, 2020, 3:00am

Hello are you able to do this in pytorch?

Nathaniel_Merrill · October 18, 2020, 2:13pm

Hey folks I have some exciting news on this front. I was trying to solve the same problem but for a large network that will not work with batch. Even then the batch method described here is still very slow. Instead I devised a way that is highly efficient for network inputs that are relatively small (in my case a VAE decoder with 32 vector input). All you have to do is use a finite difference jacobian, and then the chainrule becomes completely unnecessary. My implementation is not general so I will not share it here (it’s also in C++ pytorch), but you can find an easy matlab version of finite difference jacobians that I based mine off of here from my wonderful math professors! Note that I had to tune the delta to work for the neural network since their matlab code is in double precision.

EigenGuy · January 9, 2021, 2:56pm

I found out that Autograd now has a functional module that solves this problem. Specifically, torch.autograd.functional.jacobian, given a function and input variables, returns the Jacobian. There are also functions to compute the Hessian, Jacobian-vector-product, etc.

craigyang · August 1, 2021, 9:34pm

Hi, torch.autograd.functional.jacobian seems to be on the right path. Anyone knows if there is a counterpart when its first argument func is from a computation graph (like the forward function of a network, or the gradient)?