I am trying to get the gradient of a vector (with length m and batch size N) with respect to another vector (with length m and batch size N). Hence, I am expecting the gradient matrix to be Nxmxm.
x = torch.rand(5, 3)
u = x*x
dudx = grad(u, x, torch.ones(x.size(), 3, 3), create_graph=True, retain_graph=True)
What is the mistake I am making? Many thanks in advance
First, You seem to assume that
N is a batch size that should be considered specially? But the autograd does not know about that. It will compute the gradient considering N as any other dimension.
autograd.grad() does a vector jacobian product (when grad_outputs is provided) and so will return something with the same size as the input.
You can use
autograd.functional.jacobian() to get the full Jacobian but as mentionned above, you will get
NxmxNxm as N is considered as any other dimension.
You can do a for loop around
jacobian for the N dimension if you want to consider the batch samples one at a time.
Thanks, @albanD. This is helpful.