z = torch.ones_like(y)

dy = torch.autograd.grad(y, X, grad_outputs=z, create_graph=True)[0]

print(dy)

d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=True)[0]

print(d2y)

z = torch.ones_like(y)

dy = torch.autograd.grad(y, X, grad_outputs=z, create_graph=True)[0]

print(dy)

d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=True)[0]

print(d2y)

can you also provide X and y?

anyways,

import torch

X=torch.tensor([1.,2.,3.],requires_grad=True)

y=X**3

z = torch.ones_like(y)

dy = torch.autograd.grad(y, X, grad_outputs=z, create_graph=True)[0]

print(dy)

d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=True)[0]

print(d2y)

The output of d2y is tensor([ 6., 12., 18.], grad_fn=< MulBackward0 >), which is just the second order derivative of y=x**3 isn’t it? y’’=6x

I am a newbie to pytorch. I recently tried to use pytorch to implement the PINN framework. The following is a simplified network layer I designed to verify the correctness of the second-order guide implementation of the program, but obviously, the output results are not very ideal. I also tried to use this form to test the Linear() and Sigmoid() layers separately, and the result was no problem. But I don’t know why there is a problem when combining them to find the second derivative. I hope you can help me find the problem.

import torch

from torch import nn

x=torch.tensor([[2.,1]],dtype=torch.float32).t()

y=torch.tensor([[1,0]],dtype=torch.float32).t()

net=nn.Sequential(nn.Linear(2,2),nn.Sigmoid())

def init_ones(m):

if type(m)==nn.Linear:

nn.init.ones_(m.weight)

nn.init.zeros_(m.bias)

net.apply(init_ones)

net[0].weight.data[1][0]=0.0

net[0].weight,net[0].bias

```
(Parameter containing:
tensor([[1., 1.],
[0., 1.]], requires_grad=True),
Parameter containing:
tensor([0., 0.], requires_grad=True))
```

X=torch.cat((x,y),1)

X.requires_grad_(True)

y=net(X)

z = torch.ones_like(y)

dy = torch.autograd.grad(y, X, grad_outputs=z,retain_graph=True, create_graph=True)[0]

print(dy)

d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=False)[0]

print(d2y)

perhaps it has something to do with how nn.Linear is defined? i.e. y=x@A.T, where ‘@’ is matmul, not the usual y=A@X. so during backprop, dy/dx == grad @ A, whereas if y=A@X, then dy/dx==A.T@grad.

I myself is a new learner too, and I didn’t notice this until now, thanks dude!

BTW, you can test this by using:

import torch

X=torch.tensor([[2., 1.],[1., 0.]],requires_grad=True)

w=torch.tensor([[1., 1.],[0., 1.]],requires_grad=True)

y=torch.nn.Sigmoid()(torch.matmul(X,w.T))

print(f’weight:{w}\n’)

print(f’input:{X}\n’)

print(f’output:{y}\n’)

z = torch.ones_like(y)

dy = torch.autograd.grad(y, X, grad_outputs=z,retain_graph=True, create_graph=True)[0]

print(f’X.grad is:{dy}’)

which gives the same result as your dy. whereas if you used y=torch.nn.Sigmoid()(torch.matmul(w,X)), then dy is the transpose of your dy

Oh, dude, your job just now was great.

Maybe I did not express it clearly enough that you did not understand the point of my question. After my manual verification, the solution of the first-order derivative is no problem.

But an unknown error occurred when `d2y = torch.autograd.grad(dy, X, grad_outputs=z, create_graph=False)[0] print(d2y)`

calculated the second derivative. You can manually calculate the second derivative of this implementation process. The correct result should be` tensor([[-0.0409, -0.0909], [-0.0909, 0.0000]])`

the output of the above program is indeed `tensor([[-0.0818, -0.1726], [-0.1817, -0.1817]])`

I believe pytorch is correct. Basically the problem is this:

y=sigmoid(x @ w.T)

dy/dx=y*(1-y) @ w

where @ is matrix mul, * is element-wise mul

to calculate second order derivatives d2y/dx2, you can draw a computational graph starting from x and ending at dy/dx, then backpropagates to x. The final answer is this:

d2y/dx2 = z @ w.T * (1 - 2y) * y * (1-y) @ w

Of course, when I tried to derive the above equations, I got the wrong answer too… if anyone is good at math please let me know how it’s derived…

This will only work if `X`

is a scalar, what you need is to use `torch.autograd.functional.hessian`

. The docs for that function are here: torch.autograd.functional.hessian — PyTorch 1.9.1 documentation

Thank you for informing me of this function, but the pytorch I use is too low to support the implementation of this function. As far as I know, the implementation of my code above is also based on the Hessian matrix.

Can you not update to a higher version of pytorch?

If not, what’s the shape of `X`

and `y`

?

```
X=tensor([[2., 1.],
[1., 0.]])
```

```
y=tensor([[0.9526, 0.7311],
[0.7311, 0.5000]], grad_fn=<SigmoidBackward>)
```

So `X`

and `y`

are both shape `[2,2]`

? Is this for a single sample or for all samples?

single sample .

I use this simple program to verify the correctness of the code block function that I will write into the PINN deep training.

That example only works because the input arrays both hold scalars, for non-scalar arguments it won’t work which is why the `torch.autograd.functional`

library exists!

Maybe you are right, dude

There’s probably a way in which to do this without the use of the `torch.autograd.functional`

library, although I’m not 100% how to do this. You could have a look at the source code for both functions and see if you can reproduce them without upgrading pytorch?

If you can update pytorch I’d recommend you do it.