Calculating Jacobian in a Differentiable Way

Is there a general way calculate the Jacobian of a module, in a way that we retain the graph and can backprop through operations on the Jacobian as well?

Something similar to how WGAN operates but defined on more than a single output.

1 Like

Do you mean backward of backward (second derivative)?
try loss.backward(create_graph=True)

see https://github.com/pytorch/pytorch/pull/1016

x = Variable(torch.randn(2, 2), requires_grad=True)
x.mul(2).sum().backward(create_graph=True)
y = x.grad.mul(2).sum()
# This accumulates grad of grad into x.grad, adding together results of both backward() calls
y.backward() 

Yes, but in this case the output is a single number, therefore you can calculate the differential in a straightforward way.

I’m wondering how you do this for multiple outputs. Here is an article describing the jacobian matrix I’m referring to: https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant

I think this function is a general way to accomplish what I was saying. I borrowed this code from: https://github.com/pytorch/pytorch/blob/85a7e0f/test/test_distributions.py#L1501

def jacobian(inputs, outputs):
    return torch.stack([grad([outputs[:, i].sum()], [inputs], retain_graph=True, create_graph=True)[0]
                        for i in range(outputs.size(1))], dim=-1)
8 Likes

Have you measured the running time of this method? One concern is that the for loop would make the backprop time grow linearly with the dimension of the output. I’m wondering whether there is way to directly get the Jacobian without the for loop.

2 Likes

The following code will do the trick with a single call to backward, taking advantage of when the function takes batched inputs. You can modify the script by passing retain_graph=True to backward and returning .grad instead of .grad.data.

1 Like