The following is my code block for realizing the second-order derivative of the matrix. Why is the output second-order derivative different from the one I manually calculated

z = torch.ones_like(y)

dy = torch.autograd.grad(y, X, grad_outputs=z, create_graph=True)[0]
print(dy)

d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=True)[0]
print(d2y)

can you also provide X and y?
anyways,
import torch
X=torch.tensor([1.,2.,3.],requires_grad=True)
y=X**3
z = torch.ones_like(y)
dy = torch.autograd.grad(y, X, grad_outputs=z, create_graph=True)[0]
print(dy)
d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=True)[0]
print(d2y)

The output of d2y is tensor([ 6., 12., 18.], grad_fn=< MulBackward0 >), which is just the second order derivative of y=x**3 isn’t it? y’’=6x

I am a newbie to pytorch. I recently tried to use pytorch to implement the PINN framework. The following is a simplified network layer I designed to verify the correctness of the second-order guide implementation of the program, but obviously, the output results are not very ideal. I also tried to use this form to test the Linear() and Sigmoid() layers separately, and the result was no problem. But I don’t know why there is a problem when combining them to find the second derivative. I hope you can help me find the problem.

import torch
from torch import nn
x=torch.tensor([[2.,1]],dtype=torch.float32).t()
y=torch.tensor([[1,0]],dtype=torch.float32).t()
net=nn.Sequential(nn.Linear(2,2),nn.Sigmoid())
def init_ones(m):
if type(m)==nn.Linear:
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
net.apply(init_ones)
net[0].weight.data[1][0]=0.0
net[0].weight,net[0].bias

(Parameter containing:
 tensor([[1., 1.],
         [0., 1.]], requires_grad=True),
 Parameter containing:
 tensor([0., 0.], requires_grad=True))

X=torch.cat((x,y),1)
X.requires_grad_(True)
y=net(X)

z = torch.ones_like(y)
dy = torch.autograd.grad(y, X, grad_outputs=z,retain_graph=True, create_graph=True)[0]
print(dy)

d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=False)[0]
print(d2y)

perhaps it has something to do with how nn.Linear is defined? i.e. y=x@A.T, where ‘@’ is matmul, not the usual y=A@X. so during backprop, dy/dx == grad @ A, whereas if y=A@X, then dy/dx==A.T@grad.

I myself is a new learner too, and I didn’t notice this until now, thanks dude!

BTW, you can test this by using:

import torch
X=torch.tensor([[2., 1.],[1., 0.]],requires_grad=True)
w=torch.tensor([[1., 1.],[0., 1.]],requires_grad=True)
y=torch.nn.Sigmoid()(torch.matmul(X,w.T))
print(f’weight:{w}\n’)
print(f’input:{X}\n’)
print(f’output:{y}\n’)
z = torch.ones_like(y)
dy = torch.autograd.grad(y, X, grad_outputs=z,retain_graph=True, create_graph=True)[0]
print(f’X.grad is:{dy}’)

which gives the same result as your dy. whereas if you used y=torch.nn.Sigmoid()(torch.matmul(w,X)), then dy is the transpose of your dy

Oh, dude, your job just now was great.
Maybe I did not express it clearly enough that you did not understand the point of my question. After my manual verification, the solution of the first-order derivative is no problem.
But an unknown error occurred when d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=False)[0] print(d2y) calculated the second derivative. You can manually calculate the second derivative of this implementation process. The correct result should be tensor([[-0.0409, -0.0909], [-0.0909, 0.0000]]) the output of the above program is indeed tensor([[-0.0818, -0.1726], [-0.1817, -0.1817]])

I believe pytorch is correct. Basically the problem is this:
y=sigmoid(x @ w.T)
dy/dx=y*(1-y) @ w
where @ is matrix mul, * is element-wise mul
to calculate second order derivatives d2y/dx2, you can draw a computational graph starting from x and ending at dy/dx, then backpropagates to x. The final answer is this:
d2y/dx2 = z @ w.T * (1 - 2y) * y * (1-y) @ w
Of course, when I tried to derive the above equations, I got the wrong answer too… if anyone is good at math please let me know how it’s derived…

This will only work if X is a scalar, what you need is to use torch.autograd.functional.hessian. The docs for that function are here: torch.autograd.functional.hessian — PyTorch 1.9.1 documentation

Thank you for informing me of this function, but the pytorch I use is too low to support the implementation of this function. As far as I know, the implementation of my code above is also based on the Hessian matrix.

Can you not update to a higher version of pytorch?

If not, what’s the shape of X and y?

X=tensor([[2., 1.],
        [1., 0.]])
y=tensor([[0.9526, 0.7311],
        [0.7311, 0.5000]], grad_fn=<SigmoidBackward>)

So X and y are both shape [2,2]? Is this for a single sample or for all samples?

single sample .
I use this simple program to verify the correctness of the code block function that I will write into the PINN deep training.

That example only works because the input arrays both hold scalars, for non-scalar arguments it won’t work which is why the torch.autograd.functional library exists!

Maybe you are right, dude

There’s probably a way in which to do this without the use of the torch.autograd.functional library, although I’m not 100% how to do this. You could have a look at the source code for both functions and see if you can reproduce them without upgrading pytorch?

If you can update pytorch I’d recommend you do it.