I’m trying to write a PyTorch Function for some black box function we’ll call f(x,y,z), where x,y,z are vectors of varying length and f returns a vector of length 4. I’m confused on the dimensions that I should be returning for the backward function. For example’s sake, we’ll say that x is a tensor with a single dimension of length 2, then in the backwards function I return a tensor with dimensions 2x4, since the vector that f returns is of length 4. Am I correct on this? Or is there some other way that I’m supposed to calculate the gradient that ALSO results in a tensor that is a single dimension of length 2, i.e. compress the output of [f(x+dx)-f(x)]/dx into a scalar?

Hi Sus!

In short, your backward function should return a tuple of tensors that

individually have the same shapes as the tensors input to your custom

(forward) function.

Quoting from Extending `torch.autograd`

:

`backward()`

(or`vjp()`

) defines the gradient formula.

…

It should return as many tensors as there were inputs, with each of them containing the gradient w.r.t. its corresponding input.

Just to be explicit, the “gradient w.r.t. its corresponding input” will have

the same shape and type as that input.

No. In your example case, the backward function should return a

one-dimensional tensor of length 2 (to match the input x, and two

more gradient tensors that match y and z).

Yes. Your backward function is supposed to compute the so-called

*vector-Jacobian product.* Your backward function will have passed into

it a tensor (more precisely, a tuple of tensors) that is the same shape as

the output of your custom function. This is the “vector” in vector-Jacobian

product.

In your example case, this will be a one-dimensional tensor of length 4 (to

match the shape of the output of f). The Jacobian of f (with respect to its

first argument, x) will indeed be a tensor of shape 2x4, but your backward

function is supposed to *contract* the 2x4 Jacobian (if it even explicitly

computes the Jacobian, which it need not necessarily do) with the length-4

vector passed into it to form the length-2 vector-Jacobian product that it then

returns.

Best.

K. Frank

Thank you so much for the detailed response! From what you’ve told me and what I’ve looked up online about the vector-Jacobian product, I would multiply each partial derivative (row of the 2x4?) of f by the output of f as the row vector, thus making the 2x4 into a 2x1 tensor, then I could just reshape it to be my vjp for x?

Also, what do you mean that I don’t need to explicitly compute the Jacobian? I think that’s exactly what I’m doing and it would be great if you knew about any resources that could clear that up. Thank you so much!

Hi Sus!

As you’ve stated things, no.

The *output* of f (that is, the output f produced during the forward

pass) is not directly relevant. Autograd keeps track of the entire

computation graph and, in particular, of everything that happens

to the output of f during the rest of the forward pass.

In the backward pass, autograd passes into f’s backward function

a “vector” that represents the gradient of the final loss function with

respect to the output of f.

The documentation for Function.backward() calls this argument `grad_outputs`

.

So, in your case, your Jacobian matrix would have shape `[2, 4]`

and `grad_outputs`

would have shape `[4]`

. You would multiply

`grad_outputs`

by the Jacobian matrix to produce the return value

of `backward()`

of shape `[2]`

. (So I think that your result tensor will

be a one-dimensional vector without any “singleton” dimension that

you would need to `.reshape()`

or `.squeeze()`

away.)

(For simplicity, I’m only talking about the case where f outputs only

a single tensor so that autograd only passes a single `grad_outputs`

tensor to your `backward()`

function.)

You’re certainly allowed to compute the Jacobian and multiply it onto

`grad_outputs`

. But `backward()`

only has to return the *result* of that

multiplication. If it can construct that result by some other (possibly

cheaper) means, it’s not required to construct the Jacobian explicitly

nor explicitly perform the multiplication.

As a trivial example, suppose your custom function maps a length-`n`

vector to another length-`n`

vector by multiplying it by `12.0`

. The

Jacobian of this function is an `n x n`

diagonal matrix that is `12.0`

times the identity matrix. `backward()`

can multiply `grad_outputs`

by the scalar `12.0`

and return it – no need to “materialize” an `n x n`

matrix nor perform a matrix-vector multiplication.

Best.

K. Frank

The

outputof f (that is, the output f produced during the forward pass) is not directly relevant. Autograd keeps track of the entire computation graph and, in particular, of everything that happens to the output of f during the rest of the forward pass.

Whoops! That is exactly what I thought I was typing, got tripped up by the `grad_outputs`

variable name I have in my code. Thank you!

If it can construct that result by some other (possibly cheaper) means, it’s not required to construct the Jacobian explicitly nor explicitly perform the multiplication.

Oh, I see, thank you so much! It definitely clears things up more for me