I just took a look at the implementation of `torch.autograd.functional.jacobian`

and as it appears it does the following (simplified for *single* input and output tensor):

```
input.requires_grad = True
output = func(input)
jac_rows = []
for o in output:
input.grad.zero_()
o.backward(retain_graph=True)
jac_rows.append(input.grad.data)
jac = torch.stack(jac_rows)
```

So if we have 1d-input and -output tensors of shape `(N,)`

and `(M,)`

respectively then this performs `M`

backward passes through the graph. In order to estimate the Jacobian one could also use Finite Difference Approximation which requires `N`

forward passes instead. One can also assume that the cost of a forward and backward pass are roughly the same.

Now the thing is, Iâ€™m using this `jacobian`

function specifically because I want to use the Levenberg-Marquardt algorithm to fit a small network (~ 100 parameters) which is more efficient than other gradient-based optimizers. The Levenberg-Marquardt algorithm requires that `M >= N`

and hence this backward Jacobian approach is less efficient than forward difference approximation (M vs. N steps). Now Iâ€™m wondering why this implementation has been chosen for `jacobian`

and if thereâ€™s a more efficient way. After all there must be a reason to use `jacobian`

rather than just one of the available optimizers. If the reason is to implement a custom optimizer using Jacobian then most likely M > N and the implementation is inefficient (e.g. in one of my cases I had `N = 84, M = 576`

so using Autograd Jacobian is about 7 times slower than Finite Difference Approximation).

Also it seems strange to compute the backward pass separately for each of the output tensorâ€™s elements. This is like using an explicit `for`

loop over the batch dimension for the forward pass (which we donâ€™t do of course). The forward pass uses all parallelization capabilities for the batch dimension, so Iâ€™m wondering whether something similar can be used for the backward pass and Jacobian, i.e. performing all the computations for `output`

at once.