hi just one more thing to follow. how do I use `torch.autograd.grad(y,X2)`

when my `y = NN(X1,X2)`

is a matrix `NxD`

where N is number of points and D is dimension ? I did` torch.autograd.grad(y, X2, create_graph=True,grad_outputs=torch.ones_like(y))[0]`

it returns tensor with shape `(N,)`

(it take derivative of the sum over dimension); but I’m more expecting `(N,D)`

that take derivative of each dimension and keep derivative of each dimension to be positive

My toy example is

```
x = torch.randn(5,1) # my X1 above
z = torch.ones(5,1) # my X2 above
z.requires_grad = True
f = nn.Linear(2,3)
x = torch.cat([x,z],dim=-1)
y = f(x)
torch.autograd.grad(y, z, create_graph=True,grad_outputs=torch.ones(y.size()))[0] # gives 5x1 but expecting 5x3
```

To be more specifically; for example for a single data

I have a vector-valued function such that

f(x,z) = [y1,y2,y3 ] =[x+3z,x+4z,x+5z]

Then, df/dz = [3,4,5]; but the autograd gives me the sum [12]

so I guess the autograd computes as

`torch.ones(y.shape) @ [ [dy1 / dx, dy1 / dz] , [dy2 / dx, dy2 / dz] , [dy2 / dx, dy2 / dz] ] = [ [1,1,1] ] @ [ [1,3],[1,4],[1,5] ]`

Is there anyway that not compute this final matrix multiplication?

The naive way would be

```
J = []
for i in range(D):
out = torch.zeros(1,D)
out[0][i] = 1
j = torch.autograd.grad(y, x, create_graph=True,grad_outputs=out)[0]
J.append(j[0])
J = torch.stack(J)
dy_dz = J[:,-1]
```

This is just not gonna work as I can’t loop high dimension every iteration and I don’t think its reasonable as I only want the dy_dz (The last column of Jacobean ) and required to loop all dimension for full Jacobean