Automatic gradient and torch.sum

SamIIT · May 17, 2020, 9:22am

I don’t want to woo you out, but if your aim is to use auto-diff with IPOPT, the best I found so far is using CasADi. There it handles the autodiff internally and in 3 lines of code you get the solution.

import casadi as cs
import numpy as np
nlp = {'x': cs.vertcat(X), 'f': f, 'g': cs.vertcat(constraints) } # X -> list of casadi sym vars, f -> objective function in terms of X, g -> list of constraints, casadi expressions
prob = cs.nlpsol('Test', 'ipopt', nlp)
sol = prob(x0 = [0]*len(X), lbx = [0]*len(x), ubx = [np.inf], lbg = [0]*len(g), ubg = [0]*len(g) )
# the above will solve, assuming, X lies in [0, infinity), and the constraint expressions are equality constraints. If these are inequalities, specify the LHS and RHS limits of the expressions.

Hope it helps.

Sam

takafusui · May 18, 2020, 2:23pm

Dear Sam, thank you very much for your kind comments. I know that CasADi does work pretty well with IPOPT and automatic gradient; however, in my current project, I heavily use one PyTorch based library. Now I can formulate all scripts based on PyTorch and nicely compute gradients by automatic gradient and supply them to IPOPT with which the performance of IPOPT greatly stables.

SamIIT · June 3, 2020, 7:12am

Good to know that Takafumi.

I was wondering if we could also avoid IPOPT altogether by sticking completely to PyTorch’s optimizers (e.g., Adam)? If it is a constrained optimization, using the Lagrange Multipliers method, we could reformulate the objective with additional variables. Inequalities likewise could be handled using slack variables. Have you ever tried it?

Sam

SamIIT · November 19, 2020, 6:09am

I write here just that others know of my experience solving an extremely non-linear model fitting problem (BWR equation) to available data (7.2k rows of P vs. T,rho values).

I tried using CasADi/Ipopt as well as pure PyTorch. I saw that both struggle if using random initialization for the 8 parameters. If we ensure a good initialization, CasADi got a fairly good result (the Jacobians were all almost zero). But PyTorch/Adam could not find a solution, even after using the same initial guess that got me the results with CasADi/Ipopt. Can anyone suggest what might be naively wrong in my implementation?

import pandas as pd
import torch
P_calc = ... # complicated function of rho, T, and 8 parameters
data = pd.read_excel("P_rho-T.xlsx")

X0 = [0.1]*8
X_params = torch.tensor(X0, requires_grad=True)

n_epochs = 100; lr = 1e-2;
optim = torch.optim.Adam([X_params], lr = lr)

# training loop
for _ in range(n_epochs):
    obj = sum( ((P_calc(T_exp, rho_exp, *X_params, torch) - P_exp)/P_exp)**2
         for i, (T_exp, rho_exp, P_exp) in data.iterrows())
    obj.backward()
    optim.step()
    optim.zero_grad()
   
    print(obj.detach().numpy())

On top if it, if I give the solution obtained by CasADi/Ipopt to PyTorch’s Adam optimizer (I tried also others), the solution starts diverging.