Hey I have a function such that maps a vector with dimension D to dimension D

phi:R^d to R^d (phi(x) = y; where both x,y are vector with same dimension)

Now; I have X which is a batch of x (shape NxD; where N is batch size and D is dimension), and fed into phi get Y with dimension NxD. (a batch of y)

I want to take gradient of each y; I expect grad(phi(x),[x]) to be size of DxD; and grad(phi(X),[X]) to be size of NxDxD

My code is

```
torch.autograd.grad(phi(X),[X])[0]
```

but get error

```
RuntimeError: grad can be implicitly created only for scalar outputs
```

Is there any neat way I can do to achieve my goal (without looping through)?