Take gradient w.r.t. an matrix

Hey I have a function such that maps a vector with dimension D to dimension D

phi:R^d to R^d (phi(x) = y; where both x,y are vector with same dimension)

Now; I have X which is a batch of x (shape NxD; where N is batch size and D is dimension), and fed into phi get Y with dimension NxD. (a batch of y)

I want to take gradient of each y; I expect grad(phi(x),[x]) to be size of DxD; and grad(phi(X),[X]) to be size of NxDxD

My code is

torch.autograd.grad(phi(X),[X])[0]

but get error

RuntimeError: grad can be implicitly created only for scalar outputs

Is there any neat way I can do to achieve my goal (without looping through)?

I think you are running into the classic problem with reverse mode automatic differentiation, where given a single output you can simultaneously compute the gradients of all inputs, but to take the gradients of multiple outputs, you must run backwards multiple times.

PyTorch doesn’t natively support forwards-mode AD, but there’s a cute trick you can use to get it: see https://github.com/pytorch/pytorch/issues/10223 (unfortunately, it is not very efficient)