Gradient of batched vector output w.r.t batched vector input?

For an input x [32, 1, 28, 28] the output of my network is y [32, 10]

Is it possible to get the gradient of each output element w.r.t the input?

Following this thread I use .backward(torch.ones_like(y)) but the x.data.grad shape is [32, 1, 28, 28].

Does the eventual gradients get added up at the end? Is there a way to customize this behaviour?

What I am looking for is x.data.grad to be [32, 10, 28, 28]

An alternative way is for me to repeat the input data and compute the gradient individually for each output element.

Hi Pagan!

Note, your title is “Gradient of vector output w.r.t batch” (in contrast
to “input w.r.t. the output”). I answer the question in your title.

If the output is a vector (rather than a single scalar), you are looking
for the Jacobian of the function that maps input to output.

torch.autograd.functional.jacobian() should do what you want, and
automates some of the work you could have chosen to do by hand.

Best.

K. Frank

Thanks @KFrank for the response. I have fixed the mistake in the title.

To follow on from what KFrank said, torch.autograd.functional.jacobian() will return an output shape which is the concatenation of the output and input, so you’ll have a repeating batch dimension. You can remove the repeating batch dim via a torch.einsum call,

out = torch.randn(32,10,32,1,28,28)
out = torch.einsum("bobiwh->boiwh",out)

Also, surely the title should be gradient of output w.r.t input for all samples in batch? You can’t have a gradient of an input w.r.t the output? Unless I’m mistaken?