hi just one more thing to follow. how do I use torch.autograd.grad(y,X2)
when my y = NN(X1,X2)
is a matrix NxD
where N is number of points and D is dimension ? I did torch.autograd.grad(y, X2, create_graph=True,grad_outputs=torch.ones_like(y))[0]
it returns tensor with shape (N,)
(it take derivative of the sum over dimension); but I’m more expecting (N,D)
that take derivative of each dimension and keep derivative of each dimension to be positive
My toy example is
x = torch.randn(5,1) # my X1 above
z = torch.ones(5,1) # my X2 above
z.requires_grad = True
f = nn.Linear(2,3)
x = torch.cat([x,z],dim=-1)
y = f(x)
torch.autograd.grad(y, z, create_graph=True,grad_outputs=torch.ones(y.size()))[0] # gives 5x1 but expecting 5x3
To be more specifically; for example for a single data
I have a vector-valued function such that
f(x,z) = [y1,y2,y3 ] =[x+3z,x+4z,x+5z]
Then, df/dz = [3,4,5]; but the autograd gives me the sum [12]
so I guess the autograd computes as
torch.ones(y.shape) @ [ [dy1 / dx, dy1 / dz] , [dy2 / dx, dy2 / dz] , [dy2 / dx, dy2 / dz] ] = [ [1,1,1] ] @ [ [1,3],[1,4],[1,5] ]
Is there anyway that not compute this final matrix multiplication?
The naive way would be
J = []
for i in range(D):
out = torch.zeros(1,D)
out[0][i] = 1
j = torch.autograd.grad(y, x, create_graph=True,grad_outputs=out)[0]
J.append(j[0])
J = torch.stack(J)
dy_dz = J[:,-1]
This is just not gonna work as I can’t loop high dimension every iteration and I don’t think its reasonable as I only want the dy_dz (The last column of Jacobean ) and required to loop all dimension for full Jacobean