Get the gradient tape


I would like to be able to retrieve the gradient tape of a given gradient computation.

For instance, let’s say I define the gradient of my outputs with respect to a given weights using torch.autograd.grad, is there any way to have access of its tape?

Thank you,


We don’t really build gradient tapes per se. But graphs. You can access some limited properties of it from python. This package can be used for example to plot the graph nicely.

Why do you want to get access to it?

Hi alban,

Thank you for your answer.

Ok I see, I am trying to get the gradients of the output of a neural network with respect to the weights. Usually backpropagation provides the gradient of the cost function which requires to compute the gradient of the outputs. I am wondering how to get this gradient, do you have any idea ?

Thank you,

You you have multiple outputs, it is called a Jacobian, Unfortunately, this is going to be quite expensive to compute (you can see this gist on how to do it).

Thanks for the answer.

Let’s say I have a model N, would it not be enough to do:

“Jacobian = torch.autograd.grad(N.output(x), N.output.weight, grad_outputs=[v1, v2, v3])” ?

Where this should give me the Jacobian-vector product with the vectors v1, v2 and v3.

Also, the aim of backpropagation is to get this Jacobian. Do the same process happen behind the scenes when I perform the previous line of code and when I just fit the model and the backpropagation is executed ?

Put simply, in terms of performance, I am wondering what is the gap between the computation of this Jacobian inside the training of tthe model and when I execute the previous line.

Unfortunately, we do not support doing the vector Jacobian product with multiple vectors at the same time. You have to use a for loop and multiple calls to backward (as is done in the gist I linked above).

Also, the aim of backpropagation is to get this Jacobian.

This is only true when your function has scalar output. If it has multiple outputs, then it only computes vector Jacobian product. This is a limitation to any deep learning framework and Automatic Differentiation in general.

The performance of using this during training is going to be quite high. It will be very dependent on your network, so you will want to try and see how slow it is.