# The following is my code block for realizing the second-order derivative of the matrix. Why is the output second-order derivative different from the one I manually calculated

z = torch.ones_like(y)

print(dy)

print(d2y)

can you also provide X and y?
anyways,
import torch
y=X**3
z = torch.ones_like(y)
print(dy)
print(d2y)

The output of d2y is tensor([ 6., 12., 18.], grad_fn=< MulBackward0 >), which is just the second order derivative of y=x**3 isn’t it? y’’=6x

I am a newbie to pytorch. I recently tried to use pytorch to implement the PINN framework. The following is a simplified network layer I designed to verify the correctness of the second-order guide implementation of the program, but obviously, the output results are not very ideal. I also tried to use this form to test the Linear() and Sigmoid() layers separately, and the result was no problem. But I don’t know why there is a problem when combining them to find the second derivative. I hope you can help me find the problem.

import torch
from torch import nn
x=torch.tensor([[2.,1]],dtype=torch.float32).t()
y=torch.tensor([[1,0]],dtype=torch.float32).t()
net=nn.Sequential(nn.Linear(2,2),nn.Sigmoid())
def init_ones(m):
if type(m)==nn.Linear:
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
net.apply(init_ones)
net.weight.data=0.0
net.weight,net.bias

``````(Parameter containing:
tensor([[1., 1.],
Parameter containing:
``````

X=torch.cat((x,y),1)
y=net(X)

z = torch.ones_like(y)
print(dy)

print(d2y)

perhaps it has something to do with how nn.Linear is defined? i.e. y=x@A.T, where ‘@’ is matmul, not the usual y=A@X. so during backprop, dy/dx == grad @ A, whereas if y=A@X, then dy/dx==A.T@grad.

I myself is a new learner too, and I didn’t notice this until now, thanks dude!

BTW, you can test this by using:

import torch
y=torch.nn.Sigmoid()(torch.matmul(X,w.T))
print(f’weight:{w}\n’)
print(f’input:{X}\n’)
print(f’output:{y}\n’)
z = torch.ones_like(y)

which gives the same result as your dy. whereas if you used y=torch.nn.Sigmoid()(torch.matmul(w,X)), then dy is the transpose of your dy

Oh, dude, your job just now was great.
Maybe I did not express it clearly enough that you did not understand the point of my question. After my manual verification, the solution of the first-order derivative is no problem.
But an unknown error occurred when `d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=False) print(d2y)` calculated the second derivative. You can manually calculate the second derivative of this implementation process. The correct result should be` tensor([[-0.0409, -0.0909], [-0.0909, 0.0000]])` the output of the above program is indeed `tensor([[-0.0818, -0.1726], [-0.1817, -0.1817]])`

I believe pytorch is correct. Basically the problem is this:
y=sigmoid(x @ w.T)
dy/dx=y*(1-y) @ w
where @ is matrix mul, * is element-wise mul
to calculate second order derivatives d2y/dx2, you can draw a computational graph starting from x and ending at dy/dx, then backpropagates to x. The final answer is this:
d2y/dx2 = z @ w.T * (1 - 2y) * y * (1-y) @ w
Of course, when I tried to derive the above equations, I got the wrong answer too… if anyone is good at math please let me know how it’s derived…

This will only work if `X` is a scalar, what you need is to use `torch.autograd.functional.hessian`. The docs for that function are here: torch.autograd.functional.hessian — PyTorch 1.9.1 documentation

Thank you for informing me of this function, but the pytorch I use is too low to support the implementation of this function. As far as I know, the implementation of my code above is also based on the Hessian matrix.

Can you not update to a higher version of pytorch?

If not, what’s the shape of `X` and `y`?

``````X=tensor([[2., 1.],
[1., 0.]])
``````
``````y=tensor([[0.9526, 0.7311],
``````

So `X` and `y` are both shape `[2,2]`? Is this for a single sample or for all samples?

single sample .
I use this simple program to verify the correctness of the code block function that I will write into the PINN deep training.

That example only works because the input arrays both hold scalars, for non-scalar arguments it won’t work which is why the `torch.autograd.functional` library exists!

Maybe you are right, dude

There’s probably a way in which to do this without the use of the `torch.autograd.functional` library, although I’m not 100% how to do this. You could have a look at the source code for both functions and see if you can reproduce them without upgrading pytorch?

If you can update pytorch I’d recommend you do it.