How to backprop a loss with a partial derivative on it (behind .backward())

Hello, I want to understand how .backward() handles this. I know that I can just type loss.backward() and get the correct result by magic but I want to understand it. This is my case:

In PINNs you have additional loss functions, for example one for the boundary condition and the other for the PDE… Let’s focus on the PDE one.

Let’s say that my input is X=(x,t), so the output is Y after a forward propagation.
The conventional approach in backprop is as follow:

Loss = (Y -Yreal)**2

and then backpropagation goes as


in PyTorch I just need Loss.backward() to achieve the same result

Now, let’s consider the case where we enforce a differential equation, let’s say the PDE, for simplicity, is:

dy/dt + y**2 + c = 0

To define this PDE I need dy/dt, so I use my network to get this value (backward propagation focused on dy/dt, also it can easily be obtained using .backward() ).

Then I set my loss function as

L = (dy/dt + y**2 + c)**2....(1)

and my cost is the summation over all samples.

Now comes the question… How do I backpropagate to update my gradients?
In PyTorch i just use .backward() but what is happening in this case?
As before, I would start with dLoss/dY, which is dL/dy, then considering (1) I have:

dL/dy = d/dy( (dy/dt + y**2 + c)**2 )
      = 2(dy/dt + y**2 + c)*(d/dy(dy/dt)??? + 2y).......(2)

Once dL/dy is obtained, what follows is a chain rule of differentiations, just like a conventional set up, so no problem there, the problem is in (2), how do I set: “dL/dy”… or my approach is incorrect? PyTorch computes Eq. (2) ??? but there is this term with “???” that seems odd to me. What is wrong in my reasoning? Thanks for your comment

I think you would have an easier time using torch.autograd.grad here (with create_graph here). Personally, I like to think of that as the standard way to get a derivative and .backward as as convenience method for when you need gradients of loss functions for an optimizer.

Best regards


Thanks for your reply. I will consider that, but my biggest concern is how to solve Eq. (2)…or at least to understand how backprop handles the first derivative dL/dy with a PDE. Some help is appreciated…

I think the way it goes is as follow:

dL/dy = 2(dy/dt + y**2 + c)*(2y)…(2)

Previously I wanted to derive also dy/dt, but this is not needed because I want to compute the loss between this factor, dy/dt, taken as the real value, and the other two expressions (taken as the network output)… it is a bit odd though because dy/dt was obtained with autograd.grad and is not a real value…but anyways if this the correct way a confirmation will be appreciated