To follow on from what KFrank said, torch.autograd.functional.jacobian() will return an output shape which is the concatenation of the output and input, so you’ll have a repeating batch dimension. You can remove the repeating batch dim via a torch.einsum call,
out = torch.randn(32,10,32,1,28,28)
out = torch.einsum("bobiwh->boiwh",out)
Also, surely the title should be gradient of output w.r.t input for all samples in batch? You can’t have a gradient of an input w.r.t the output? Unless I’m mistaken?