Loss function coming from a PDE

Hello,

I’m trying to train a PINN and I’m computing the physics loss using FEniCS so I am having to detach a torch tensor and then I have to provide an implementation of the gradient. When I train the loss stays constant after each epoch, it doesnt increase or decrease. This probably means that the gradient of the loss that torch is obtaining is zero. I would appreciate some help figuring out what I’m doing wrong as well as any suggestions on the best practice for training PINNs this way.

Here is the implementation that I’m using:

This is the autograd subclass that Im using:

class PDE_Loss(torch.autograd.Function):
    def __init__(self):
        super(PDE_Loss, self).__init__()
    #p_dofs: this is the output of the neural net
    #A_matrix_params: this is the input to the neural net
    #formulation is a class that computes the necessary vectors and matrices using 
     #FEniCS
    @staticmethod
    def forward(
        ctx,
        p_dofs: torch.tensor,
        A_matrix_params: torch.tensor,
        lossfun: torch.nn.Module,
        formulation: Dg.Darcy_primal_formulation,
        f: torch.tensor,
    ) -> torch.tensor:
        result = lossfun(
            torch.matmul(
                torch.from_numpy(
                    formulation.assemble_linear_system(A_matrix_params.detach().numpy())
                ),
                p_dofs,
            ),
            f,
        )
        ctx.formulation = formulation
        ctx.lossfun = lossfun
        ctx.save_for_backward(
            p_dofs,
            A_matrix_params,
            f,
            result,
        )
        # print(f"{result=}")
        return result
     #Gradient is estimated using finite differences
    @staticmethod
    def backward(ctx, grad_output):
        eps = 1e-8
        (
            p_dofs,
            A_matrix_params,
            f,
            forward_res,
        ) = ctx.saved_tensors

        formulation = ctx.formulation
        lossfun = ctx.lossfun

        grad = []

        for i in range(3):
            A_matrix_params_eps_away = A_matrix_params.clone()
            A_matrix_params_eps_away[i] += eps

            result_eps_away = lossfun(
                torch.matmul(
                    torch.from_numpy(
                        formulation.assemble_linear_system(
                            A_matrix_params_eps_away.numpy()
                        )
                    ),
                    p_dofs,
                ),
                f,
            )
            grad.append(((result_eps_away - forward_res) / eps) * grad_output)
        return None, torch.tensor(np.array(grad)), None, None, None

Hi Sebastian!

In principle you should be able to make this approach work.

My first suggestion would be to test your gradient implementation with
a single forward pass / backward pass (rather than observing that your
training doesn’t work and trying to deduce that you’re getting a zero
gradient). Take a simple test case for which you are comfortable that
you can reliably compute the gradient with some independent method,
say analytically or with an independent numerical computation.

Does your backward() method (as called by running a forward pass
followed by a backward pass) reproduce the known gradient for your
test case? Does it incorrectly return zero?

Note that your eps = 1e-8 is at or below the edge of single-precision
round-off error so that A_matrix_params_eps_away[i] += eps may
end up being numerically equal to A_matrix_params, which would
give you zero for your finite-difference estimate of the gradient.

If you can find an analytic or semi-analytic formula for the gradient, you
will almost certainly be better off using that than numerical differentiation.
If you have to use numerical (finite-difference) differentiation, it should
be possible to make it work adequately well, although such computations
can be nuanced and rather tricky.

Start by making eps bigger than 1e-8. You might consider performing
the numerical differentiation in double precision if doing so is compatible
with your PDE solver.

In general, if you make eps too big, your finite-difference estimate will
become polluted with higher-order terms. If you make it too small, your
estimate will become noisy due to round-off error, and at some point
eps will, so to speak, round down to zero. I would suggest that you
take a selection of “typical” use cases and try to empirically determine
a reasonable value for eps.

You might also look for an existing numerical-differentiation package.
A good one will likely give you more robust results than something you
write yourself. However, such a package will also tend to be a bit tricky
to use – you will probably have to provide it some sort of “control
parameters” that, in effect, give it some guidance as to how large it
should make its internal equivalent of eps.

Good luck!

K. Frank