# Autograd Jacobian vs. Finite Difference Approximation

I just took a look at the implementation of `torch.autograd.functional.jacobian` and as it appears it does the following (simplified for single input and output tensor):

``````input.requires_grad = True
output = func(input)
jac_rows = []
for o in output:
o.backward(retain_graph=True)
jac = torch.stack(jac_rows)
``````

So if we have 1d-input and -output tensors of shape `(N,)` and `(M,)` respectively then this performs `M` backward passes through the graph. In order to estimate the Jacobian one could also use Finite Difference Approximation which requires `N` forward passes instead. One can also assume that the cost of a forward and backward pass are roughly the same.

Now the thing is, Iâ€™m using this `jacobian` function specifically because I want to use the Levenberg-Marquardt algorithm to fit a small network (~ 100 parameters) which is more efficient than other gradient-based optimizers. The Levenberg-Marquardt algorithm requires that `M >= N` and hence this backward Jacobian approach is less efficient than forward difference approximation (M vs. N steps). Now Iâ€™m wondering why this implementation has been chosen for `jacobian` and if thereâ€™s a more efficient way. After all there must be a reason to use `jacobian` rather than just one of the available optimizers. If the reason is to implement a custom optimizer using Jacobian then most likely M > N and the implementation is inefficient (e.g. in one of my cases I had `N = 84, M = 576` so using Autograd Jacobian is about 7 times slower than Finite Difference Approximation).

Also it seems strange to compute the backward pass separately for each of the output tensorâ€™s elements. This is like using an explicit `for` loop over the batch dimension for the forward pass (which we donâ€™t do of course). The forward pass uses all parallelization capabilities for the batch dimension, so Iâ€™m wondering whether something similar can be used for the backward pass and Jacobian, i.e. performing all the computations for `output` at once.

I have almost the same issue as you had. I have a model with input N=15 and output M=55, and I need to compute the Jacobian with respect to the input parameters (15).

Using torch.autograd.functional.jacobian is almost 8 times slower than the finite difference method.

I am curious whether autograd can be implemented in a way which is faster than finite difference.

1 Like

You want to use autograd jacobian in the â€śforward-modeâ€ť in this case for the â€śstrategyâ€ť argument. In the forward mode it will do N passes and so will be as fast as finite differences, except that it will be exact.

Thereâ€™s also torch.func.jacfwd and torch.func.jacrev (these use vmap by default underneath to vectorize the gradient computations)

https://pytorch.org/docs/stable/func.api.html#module-torch.func