Hello! I want to get the Jacobian matrix using Pytorch automatic differentiation. I have this code so far:

x = np.arange(1,3,1)
x = torch.from_numpy(x).reshape(len(x),1)
x = x.float()
x.requires_grad = True
w1 = torch.randn((2,2), requires_grad = True)
y = w1@x
jac = torch.autograd.grad(y, x, grad_outputs=y.data.new(y.shape).fill_(1), create_graph=True)

So I want jac to recover w1. But in this form, jac has the same dimensions as y and I am not sure what to do. Can someone tell me how to fix that function such that I get what I need? Thank you!

I’m not particularly sure of the math here. But shouldn’t the following get you the required matrix? You’re trying to compute gradient(y, x) as per docs. Doing y.backward() will store gradient(w, x) as required by gradient descent. The problem if you try that directly is that it throws an error that it can’t be computed implicitly. As a hack, you can do y[i].backward(retain_graph=True) and accumulate the required gradient in w1.grad over each function.

>>> x = np.arange(1,3,1)
>>> x = torch.from_numpy(x).reshape(len(x),1)
>>> x = x.float()
>>> x.requires_grad = True
>>>
>>> w1 = torch.randn((2,2), requires_grad = True)
>>> y = w1@x
>>>
>>> # Compute gradients of w1 . x
...
>>> y[0].backward(retain_graph=True)
>>> w1.grad
tensor([[1., 2.],
[0., 0.]])
>>> y[1].backward(retain_graph=True)
>>> w1.grad
tensor([[1., 2.],
[1., 2.]])

This is not really what I need to do. I want the returned variable to be equal to the original w1 (this is by definition the Jacobian). For example the w1 is this:

computes the product of the jacobian with the vector given in ‘grad_outputs’. In ‘grad_outputs’ you give the vector [1., 1.].

grad_outputs=y.data.new(y.shape).fill_(1)

So what you see is the expected behaviour. To compute the jacobian, you have to multiply with [1., 0] to extract the first column, then with [0., 1.] to extract the second column.

Here is the complete code:

x = np.arange(1,3,1)
x = torch.from_numpy(x).reshape(len(x),1)
x = x.float()
x.requires_grad = True
w1 = torch.randn((2,2), requires_grad = False)
y = w1@x
print(w1)
jacT = torch.zeros(2,2)
for i in range(2):
output = torch.zeros(2,1)
output[i] = 1.
jacT[:,i:i+1] = torch.autograd.grad(y, x, grad_outputs=output, retain_graph=True)[0]

Thanks a lot! That’s what I need. Is there however a way to do this without the for loop? For my problem I need to add the Jacobian to the loss function, so I need to compute it after each iteration. Also the input and output are not just 2 dimensional, so that for loop would slow down the whole code quite a bit?

maybe you can formulate your problem in another way, so that the Jacobian is not needed? Maybe you can elaborate a little bit more? What are you trying to achieve and in which way do you want to include the Jacobian into the loss function?

Hello! I want the Jacobian between the output layer and some hidden layer to be as close as possible to identity. So mathematically, I need to compute the derivative of each output node with respect to each node in this hidden layer, build a matrix out of these derivatives and minimize abs(Jacobian - Identity). It is for a physics related project and I actually need the Jacobian. The physics formulation is quite easy, I just don’t know how to implement it without a for look. Thank you!

Hey folks I have some exciting news on this front. I was trying to solve the same problem but for a large network that will not work with batch. Even then the batch method described here is still very slow. Instead I devised a way that is highly efficient for network inputs that are relatively small (in my case a VAE decoder with 32 vector input). All you have to do is use a finite difference jacobian, and then the chainrule becomes completely unnecessary. My implementation is not general so I will not share it here (it’s also in C++ pytorch), but you can find an easy matlab version of finite difference jacobians that I based mine off of here from my wonderful math professors! Note that I had to tune the delta to work for the neural network since their matlab code is in double precision.

I found out that Autograd now has a functional module that solves this problem. Specifically, torch.autograd.functional.jacobian, given a function and input variables, returns the Jacobian. There are also functions to compute the Hessian, Jacobian-vector-product, etc.

Hi, torch.autograd.functional.jacobian seems to be on the right path. Anyone knows if there is a counterpart when its first argument func is from a computation graph (like the forward function of a network, or the gradient)?