Hey I have a function such that maps a vector with dimension D to dimension D
phi:R^d to R^d (phi(x) = y; where both x,y are vector with same dimension)
Now; I have X which is a batch of x (shape NxD; where N is batch size and D is dimension), and fed into phi get Y with dimension NxD. (a batch of y)
I want to take gradient of each y; I expect grad(phi(x),[x]) to be size of DxD; and grad(phi(X),[X]) to be size of NxDxD
My code is
torch.autograd.grad(phi(X),[X])[0]
but get error
RuntimeError: grad can be implicitly created only for scalar outputs
Is there any neat way I can do to achieve my goal (without looping through)?