Gradient of batched vector output w.r.t batched vector input?

paganpasta · April 19, 2022, 7:50am

For an input x [32, 1, 28, 28] the output of my network is y [32, 10]

Is it possible to get the gradient of each output element w.r.t the input?

Following this thread I use .backward(torch.ones_like(y)) but the x.data.grad shape is [32, 1, 28, 28].

Does the eventual gradients get added up at the end? Is there a way to customize this behaviour?

What I am looking for is x.data.grad to be [32, 10, 28, 28]

An alternative way is for me to repeat the input data and compute the gradient individually for each output element.

KFrank · April 19, 2022, 8:08pm

Hi Pagan!

Note, your title is “Gradient of vector output w.r.t batch” (in contrast
to “input w.r.t. the output”). I answer the question in your title.

If the output is a vector (rather than a single scalar), you are looking
for the Jacobian of the function that maps input to output.

torch.autograd.functional.jacobian() should do what you want, and
automates some of the work you could have chosen to do by hand.

Best.

K. Frank

paganpasta · April 24, 2022, 2:33pm

Thanks @KFrank for the response. I have fixed the mistake in the title.

AlphaBetaGamma96 · April 24, 2022, 3:53pm

To follow on from what KFrank said, torch.autograd.functional.jacobian() will return an output shape which is the concatenation of the output and input, so you’ll have a repeating batch dimension. You can remove the repeating batch dim via a torch.einsum call,

out = torch.randn(32,10,32,1,28,28)
out = torch.einsum("bobiwh->boiwh",out)

Also, surely the title should be gradient of output w.r.t input for all samples in batch? You can’t have a gradient of an input w.r.t the output? Unless I’m mistaken?