Thanks for the answer.
Let’s say I have a model N, would it not be enough to do:
“Jacobian = torch.autograd.grad(N.output(x), N.output.weight, grad_outputs=[v1, v2, v3])” ?
Where this should give me the Jacobian-vector product with the vectors v1, v2 and v3.
Also, the aim of backpropagation is to get this Jacobian. Do the same process happen behind the scenes when I perform the previous line of code and when I just fit the model and the backpropagation is executed ?
Put simply, in terms of performance, I am wondering what is the gap between the computation of this Jacobian inside the training of tthe model and when I execute the previous line.