Hello, I want to understand how .backward() handles this. I know that I can just type loss.backward() and get the correct result by magic but I want to understand it. This is my case:
In PINNs you have additional loss functions, for example one for the boundary condition and the other for the PDE… Let’s focus on the PDE one.
Let’s say that my input is X=(x,t), so the output is Y after a forward propagation.
The conventional approach in backprop is as follow:
Loss = (Y -Yreal)**2
and then backpropagation goes as
(dLoss/dY)(dY/dZ)(..)....(..)
in PyTorch I just need Loss.backward() to achieve the same result
Now, let’s consider the case where we enforce a differential equation, let’s say the PDE, for simplicity, is:
dy/dt + y**2 + c = 0
To define this PDE I need dy/dt, so I use my network to get this value (backward propagation focused on dy/dt, also it can easily be obtained using .backward() ).
Then I set my loss function as
L = (dy/dt + y**2 + c)**2....(1)
and my cost is the summation over all samples.
Now comes the question… How do I backpropagate to update my gradients?
In PyTorch i just use .backward() but what is happening in this case?
As before, I would start with dLoss/dY, which is dL/dy, then considering (1) I have:
dL/dy = d/dy( (dy/dt + y**2 + c)**2 )
= 2(dy/dt + y**2 + c)*(d/dy(dy/dt)??? + 2y).......(2)
Once dL/dy is obtained, what follows is a chain rule of differentiations, just like a conventional set up, so no problem there, the problem is in (2), how do I set: “dL/dy”… or my approach is incorrect? PyTorch computes Eq. (2) ??? but there is this term with “???” that seems odd to me. What is wrong in my reasoning? Thanks for your comment