How to get separated gradients from torch.autograd


I am trying to get the gradients of a function with multiple vector outputs. Let x, y, z as input and u,v,w as output, and they are all vectors with length N. if I use torch.autograd.grad, my code is like

def func(x):
    do something fancy to x

def gradients(outpt, inpt):
    return torch.autograd.grad(
        outputs, inputs, grad_outputs=torch.ones_like(outpt), create_graph=True

inpt = torch.stack((x, y, z), 1)
outpt = func(inpt)
grad = gradients(outpt, inpt)

Then the grad is a N*3 shape matrix, and grad[:, 0] is actually $u_x+u_y+u_z$ (the summation of partial derivatives of u to x, y and z). But I want to get $u_x, u_y, u_z$ separately. I have tried the torch.autograd.functional.jacobian as following,

def func(x):
    do something fancy to x

def jacobian(f, x):
    return torch.sum(
        torch.autograd.functional.jacobian(f, x, create_graph=True), axis=0

inpt = torch.stack((x, y, z), 1)
grad = jacobian(func, inpt)

This can give me the grad contains separate gradients of u, v, w to x, y, z, the grad size is 3*N*3, where $u_y$ (the derivative of u to y) is grad[0, :, 1], etc. But it will compute the derivatives for each single output element to each single input element, which takes too much GPU memory unnecessarily.

So, is there an elegant way to get the gradients I want with torch.autograd? Thank you so much!