z = torch.ones_like(y)
dy = torch.autograd.grad(y, X, grad_outputs=z, create_graph=True)[0]
print(dy)
d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=True)[0]
print(d2y)
z = torch.ones_like(y)
dy = torch.autograd.grad(y, X, grad_outputs=z, create_graph=True)[0]
print(dy)
d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=True)[0]
print(d2y)
can you also provide X and y?
anyways,
import torch
X=torch.tensor([1.,2.,3.],requires_grad=True)
y=X**3
z = torch.ones_like(y)
dy = torch.autograd.grad(y, X, grad_outputs=z, create_graph=True)[0]
print(dy)
d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=True)[0]
print(d2y)
The output of d2y is tensor([ 6., 12., 18.], grad_fn=< MulBackward0 >), which is just the second order derivative of y=x**3 isn’t it? y’’=6x
I am a newbie to pytorch. I recently tried to use pytorch to implement the PINN framework. The following is a simplified network layer I designed to verify the correctness of the second-order guide implementation of the program, but obviously, the output results are not very ideal. I also tried to use this form to test the Linear() and Sigmoid() layers separately, and the result was no problem. But I don’t know why there is a problem when combining them to find the second derivative. I hope you can help me find the problem.
import torch
from torch import nn
x=torch.tensor([[2.,1]],dtype=torch.float32).t()
y=torch.tensor([[1,0]],dtype=torch.float32).t()
net=nn.Sequential(nn.Linear(2,2),nn.Sigmoid())
def init_ones(m):
if type(m)==nn.Linear:
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
net.apply(init_ones)
net[0].weight.data[1][0]=0.0
net[0].weight,net[0].bias
(Parameter containing:
tensor([[1., 1.],
[0., 1.]], requires_grad=True),
Parameter containing:
tensor([0., 0.], requires_grad=True))
X=torch.cat((x,y),1)
X.requires_grad_(True)
y=net(X)
z = torch.ones_like(y)
dy = torch.autograd.grad(y, X, grad_outputs=z,retain_graph=True, create_graph=True)[0]
print(dy)
d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=False)[0]
print(d2y)
perhaps it has something to do with how nn.Linear is defined? i.e. y=x@A.T, where ‘@’ is matmul, not the usual y=A@X. so during backprop, dy/dx == grad @ A, whereas if y=A@X, then dy/dx==A.T@grad.
I myself is a new learner too, and I didn’t notice this until now, thanks dude!
BTW, you can test this by using:
import torch
X=torch.tensor([[2., 1.],[1., 0.]],requires_grad=True)
w=torch.tensor([[1., 1.],[0., 1.]],requires_grad=True)
y=torch.nn.Sigmoid()(torch.matmul(X,w.T))
print(f’weight:{w}\n’)
print(f’input:{X}\n’)
print(f’output:{y}\n’)
z = torch.ones_like(y)
dy = torch.autograd.grad(y, X, grad_outputs=z,retain_graph=True, create_graph=True)[0]
print(f’X.grad is:{dy}’)
which gives the same result as your dy. whereas if you used y=torch.nn.Sigmoid()(torch.matmul(w,X)), then dy is the transpose of your dy
Oh, dude, your job just now was great.
Maybe I did not express it clearly enough that you did not understand the point of my question. After my manual verification, the solution of the first-order derivative is no problem.
But an unknown error occurred when d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=False)[0] print(d2y)
calculated the second derivative. You can manually calculate the second derivative of this implementation process. The correct result should be tensor([[-0.0409, -0.0909], [-0.0909, 0.0000]])
the output of the above program is indeed tensor([[-0.0818, -0.1726], [-0.1817, -0.1817]])
I believe pytorch is correct. Basically the problem is this:
y=sigmoid(x @ w.T)
dy/dx=y*(1-y) @ w
where @ is matrix mul, * is element-wise mul
to calculate second order derivatives d2y/dx2, you can draw a computational graph starting from x and ending at dy/dx, then backpropagates to x. The final answer is this:
d2y/dx2 = z @ w.T * (1 - 2y) * y * (1-y) @ w
Of course, when I tried to derive the above equations, I got the wrong answer too… if anyone is good at math please let me know how it’s derived…
This will only work if X
is a scalar, what you need is to use torch.autograd.functional.hessian
. The docs for that function are here: torch.autograd.functional.hessian — PyTorch 1.9.1 documentation
Thank you for informing me of this function, but the pytorch I use is too low to support the implementation of this function. As far as I know, the implementation of my code above is also based on the Hessian matrix.
Can you not update to a higher version of pytorch?
If not, what’s the shape of X
and y
?
X=tensor([[2., 1.],
[1., 0.]])
y=tensor([[0.9526, 0.7311],
[0.7311, 0.5000]], grad_fn=<SigmoidBackward>)
So X
and y
are both shape [2,2]
? Is this for a single sample or for all samples?
single sample .
I use this simple program to verify the correctness of the code block function that I will write into the PINN deep training.
That example only works because the input arrays both hold scalars, for non-scalar arguments it won’t work which is why the torch.autograd.functional
library exists!
Maybe you are right, dude
There’s probably a way in which to do this without the use of the torch.autograd.functional
library, although I’m not 100% how to do this. You could have a look at the source code for both functions and see if you can reproduce them without upgrading pytorch?
If you can update pytorch I’d recommend you do it.