Thanks for the answer.

Let’s say I have a model N, would it not be enough to do:

“Jacobian = torch.autograd.grad(N.output(x), N.output.weight, grad_outputs=[v1, v2, v3])” ?

Where this should give me the Jacobian-vector product with the vectors v1, v2 and v3.

Also, the aim of backpropagation is to get this Jacobian. Do the same process happen behind the scenes when I perform the previous line of code and when I just fit the model and the backpropagation is executed ?

Put simply, in terms of performance, I am wondering what is the gap between the computation of this Jacobian inside the training of tthe model and when I execute the previous line.