Automatic gradient and torch.sum

I don’t want to woo you out, but if your aim is to use auto-diff with IPOPT, the best I found so far is using CasADi. There it handles the autodiff internally and in 3 lines of code you get the solution.

import casadi as cs
import numpy as np
nlp = {'x': cs.vertcat(X), 'f': f, 'g': cs.vertcat(constraints) } # X -> list of casadi sym vars, f -> objective function in terms of X, g -> list of constraints, casadi expressions
prob = cs.nlpsol('Test', 'ipopt', nlp)
sol = prob(x0 = [0]*len(X), lbx = [0]*len(x), ubx = [np.inf], lbg = [0]*len(g), ubg = [0]*len(g) )
# the above will solve, assuming, X lies in [0, infinity), and the constraint expressions are equality constraints. If these are inequalities, specify the LHS and RHS limits of the expressions.

Hope it helps.

Sam

Dear Sam, thank you very much for your kind comments. I know that CasADi does work pretty well with IPOPT and automatic gradient; however, in my current project, I heavily use one PyTorch based library. Now I can formulate all scripts based on PyTorch and nicely compute gradients by automatic gradient and supply them to IPOPT with which the performance of IPOPT greatly stables.

1 Like

Good to know that Takafumi.

I was wondering if we could also avoid IPOPT altogether by sticking completely to PyTorch’s optimizers (e.g., Adam)? If it is a constrained optimization, using the Lagrange Multipliers method, we could reformulate the objective with additional variables. Inequalities likewise could be handled using slack variables. Have you ever tried it?

Sam

I write here just that others know of my experience solving an extremely non-linear model fitting problem (BWR equation) to available data (7.2k rows of P vs. T,rho values).

I tried using CasADi/Ipopt as well as pure PyTorch. I saw that both struggle if using random initialization for the 8 parameters. If we ensure a good initialization, CasADi got a fairly good result (the Jacobians were all almost zero). But PyTorch/Adam could not find a solution, even after using the same initial guess that got me the results with CasADi/Ipopt. Can anyone suggest what might be naively wrong in my implementation?

import pandas as pd
import torch
P_calc = ... # complicated function of rho, T, and 8 parameters
data = pd.read_excel("P_rho-T.xlsx")

X0 = [0.1]*8
X_params = torch.tensor(X0, requires_grad=True)

n_epochs = 100; lr = 1e-2;
optim = torch.optim.Adam([X_params], lr = lr)

# training loop
for _ in range(n_epochs):
    obj = sum( ((P_calc(T_exp, rho_exp, *X_params, torch) - P_exp)/P_exp)**2
         for i, (T_exp, rho_exp, P_exp) in data.iterrows())
    obj.backward()
    optim.step()
    optim.zero_grad()
   
    print(obj.detach().numpy())

On top if it, if I give the solution obtained by CasADi/Ipopt to PyTorch’s Adam optimizer (I tried also others), the solution starts diverging.