Fitting a rational polynomial to a complex data - Why is my loss not changing? What am I doing wrong

I am trying to fit a rational function to complex data using pytorch. I am able to compute the loss correctly but I did not observe a reduction in the loss function. Kindly assist as I am still a newbie with pytorch. I have successfully done this using scipy.optimize.minimize. My intention is to see if I could use pytorch for the same thing. My example is below

import numpy as np
import torch
from torch import nn

# Data
F = torch.tensor([8, 25, 42, 59, 76, 93, 127, 161, 246, 314, 399, 501, 637, 788, 993, 1252,
           1585, 1995, 2512, 3162, 3981, 5012, 6310, 7943, 10000, 12589, 15849, 19953, 25119, 31623, 39811, 50119,
          63096,   79433,  100000,  125893,  158490,  199527, 251189,  316228, 398108,  501188,  630958,  794329, 1000000])

Y = torch.tensor([2.7820e-05+1.9965e-05j, 4.4625e-05+2.9172e-05j, 5.4679e-05+3.3440e-05j,
        6.1465e-05+3.6670e-05j, 6.7193e-05+3.8804e-05j, 7.1745e-05+4.1246e-05j,
        7.8491e-05+4.4649e-05j, 8.4303e-05+4.7946e-05j, 9.4247e-05+5.5973e-05j,
        9.9564e-05+6.2098e-05j, 1.0543e-04+6.8537e-05j, 1.1094e-04+7.7572e-05j,
        1.1712e-04+8.9025e-05j, 1.2216e-04+1.0136e-04j, 1.2858e-04+1.1761e-04j,
        1.3547e-04+1.3883e-04j, 1.4320e-04+1.6582e-04j, 1.5198e-04+1.9882e-04j,
        1.6214e-04+2.3993e-04j, 1.7473e-04+2.9009e-04j, 1.9064e-04+3.5326e-04j,
        2.1126e-04+4.3044e-04j, 2.3898e-04+5.2610e-04j, 2.7717e-04+6.4262e-04j,
        3.2993e-04+7.8392e-04j, 4.0355e-04+9.5308e-04j, 5.0546e-04+1.1531e-03j,
        6.4983e-04+1.3836e-03j, 8.4780e-04+1.6383e-03j, 1.1141e-03+1.9142e-03j,
        1.4616e-03+2.1828e-03j, 1.8944e-03+2.4220e-03j, 2.4044e-03+2.6006e-03j,
        2.9653e-03+2.6870e-03j, 3.5370e-03+2.6675e-03j, 4.0787e-03+2.5499e-03j,
        4.5529e-03+2.3500e-03j, 4.9540e-03+2.1088e-03j, 5.2742e-03+1.8576e-03j,
        5.5202e-03+1.5838e-03j, 5.7254e-03+1.3473e-03j, 5.8689e-03+1.1367e-03j,
        5.9645e-03+9.5294e-04j, 6.0288e-03+7.7854e-04j, 6.1126e-03+6.4864e-04j])

sigma_Y = torch.tensor([2.6392e-08, 4.1651e-08, 5.0401e-08, 5.6586e-08, 6.1554e-08, 6.5615e-08,
        7.2068e-08, 7.7146e-08, 8.6435e-08, 9.1971e-08, 9.7485e-08, 1.0291e-07,
        1.0886e-07, 1.1439e-07, 1.2080e-07, 1.2778e-07, 1.3563e-07, 1.4389e-07,
        1.5327e-07, 1.6397e-07, 1.7726e-07, 1.9387e-07, 2.1571e-07, 2.4450e-07,
        2.8238e-07, 3.3201e-07, 3.9591e-07, 4.7788e-07, 5.7940e-07, 7.0818e-07,
        8.6534e-07, 1.0613e-06, 1.3125e-06, 1.6464e-06, 2.1095e-06, 2.7707e-06,
        3.7069e-06, 5.0162e-06, 6.7130e-06, 8.6371e-06, 1.0583e-05, 1.2018e-05,
        1.2690e-05, 1.2639e-05, 1.2236e-05])
# polyval function
def polyval(p, x):
    '''Evaluate polynomial p(x) using Horner's Method.

    New numpy.polynomial.Polynomial format:
        p[0] + p[1] * x + p[2] * x^2 + ...
    '''
    result = torch.zeros_like(x)
    for coeff in p:
        result = x * result + coeff
    return result


# rational function
def pade_approx(x, *p):  
    norder = int((len(p)-1)/2)
    a =  torch.nn.Softplus()(torch.tensor(p, requires_grad=True)[0:norder+1])
    b =  torch.nn.Softplus()(torch.concat([torch.tensor(p, requires_grad=True)[norder+1:2*norder+1], torch.tensor([1])], axis = 0))
    Ypa = polyval(a,torch.sqrt(1j*x))/polyval(b,torch.sqrt(1j*x))
    return torch.concat([(Ypa).real, (Ypa).imag], dim = 0)

# loss function
def obj_fun(p, x, y, yerr):
    ndata = len(x)
    npar = len(p)
    dof = (2*ndata-(npar))
    y_concat = torch.concat([y.real, y.imag], dim = 0)
    sigma = torch.concat([yerr, yerr], dim = 0)
    y_model = pade_approx(x, *p)
    chi_sqr = (1/dof)*(torch.sum(torch.abs((1/sigma**2) * (y_concat - y_model)**2)))
    return (chi_sqr)

order = 5
p0 = torch.ones(2*order+1, requires_grad=True)
pade_model = {"param":p0, "func": pade_approx, "obj_fun":obj_fun}

def fit(model, x, y, yerr, steps=int(5e5)):  # With this line, the current I is included in the fit
#def fit(model, f_data, Z_data, steps=int(5e4)):          # This line does not include the current I in the fit

    p = model["param"]
    noise_free_prediction = model["func"]
    obj_fun = model["obj_fun"]

    optimizer = torch.optim.Adam(params=[p], lr=1e-2)#lr=1e-4)
    # optimizer = torch.optim.LBFGS(params=[param],max_iter=10000, lr=0.1)#lr=1e-4)


    for i in range(steps):
        
        optimizer.zero_grad()
        #loss = get_loss(param,f_data,Z_data)
        loss = obj_fun(p, x, y, yerr)
        
        #skip=10
        #if i%int(max(steps/skip,skip))==0:
        if i%int(steps/10)==0:
            print("" + str(i) + ": "
                #+ "loss = " + str(loss)
                + "loss=" + "{:5.3e}".format(loss)
                #+ ", parameters: "+ str(["{:5.3e}".format(p.detach()) for p in param.values() ])
            ) 
           
        loss.backward()
        optimizer.step()

from datetime import datetime
start = datetime.now()
print(start)


fit(pade_model, F, Y, sigma_Y, steps=int(1e4)) # 5e5, 1e4 are probably very good already
end = datetime.now()
print(end-start)

#Here is the output
#2022-06-05 18:53:53.428699
#0: loss=5.869e+13
#1000: loss=5.869e+13
#2000: loss=5.869e+13
#3000: loss=5.869e+13
#4000: loss=5.869e+13
#5000: loss=5.869e+13
#6000: loss=5.869e+13
#7000: loss=5.869e+13
#8000: loss=5.869e+13
#9000: loss=5.869e+13
#0:00:25.424747


Hi Richard!

Without addressing the rest of your code, your primary problem is here.

Using the torch.tensor() factory function to build a new tensor from
an existing tensor “breaks the computation graph” (that is, gradients
won’t backpropagate through it), and is generally not recommended.

Consider:

>>> import torch
>>> torch.__version__
'1.11.0'
>>> t = torch.zeros (5, requires_grad = True)
>>>
>>> x = torch.tensor (t)
<stdin>:1: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
>>> x   # no requires_grad = True nor grad_fn
tensor([0., 0., 0., 0., 0.])

Furthermore, setting requires_grad = True on the new tensor doesn’t
undo the damage (although it might hide the problem for a while):

>>> y = torch.tensor (t, requires_grad = True)
>>> y   # has requires_grad = True, but "the computation graph is broken"
tensor([0., 0., 0., 0., 0.], requires_grad=True)
>>> y.sum().backward()
>>> t.grad   # no grad
>>>

If you need to build a new tensor directly from an existing one, you
should generally use clone – and doing do preserves the computation
graph:

>>> z = t.clone()
>>> z   # has grad_fn, computation graph intact
tensor([0., 0., 0., 0., 0.], grad_fn=<CloneBackward0>)
>>> z.sum().backward()
>>> t.grad   # has grad
tensor([1., 1., 1., 1., 1.])

Also, “unbinding” your tensor into pieces with *p and then putting the
pieces back together again doesn’t help with this issue. You could
probably get *p to work, but I don’t see that it adds anything. To me,
it looks like unnecessary complication and opportunity for error.

It looks like your loss isn’t changing at all (at least not to four decimal
places). This suggests that the parameters you are trying to optimize
have no (or zero) gradients. As you work on fixing things, you should
try performing just a single forward and backward pass, and check that
your parameters at least have gradients, and then check that a single
optimizer step does change your parameters, at least a little. Once you
get that sorted out, you can move on to the higher-level question of
whether your optimization is converging to a sensible solution.

(As a stylistic aside, I would probably use the functional form of softplus()
rather than constructing a Softplus function object and applying it.
They do the same thing, but in the context of your code the functional
version appears to me to be a bit more direct.)

Best.

K. Frank

YaaaaY, thanks Frank. so I got it to work following your instructions. I modified the function pade_approx by taking away the torch.tensor call on p and removing the asterics. (used a learning rate of 1e-1)

# rational function
def pade_approx(x, p):  
    norder = int((len(p)-1)/2)
    a =  torch.nn.Softplus()(p[0:norder+1])
    b =  torch.nn.Softplus()(torch.concat([p[norder+1:2*norder+1], torch.tensor([1])], axis = 0))
    Ypa = polyval(a,torch.sqrt(1j*x))/polyval(b,torch.sqrt(1j*x))
    return torch.concat([(Ypa).real, (Ypa).imag], dim = 0)

#2022-06-06 10:24:57.098570
#0: loss=5.869e+13
#10000: loss=2.850e+07
#20000: loss=1.829e+06
#30000: loss=5.133e+05
#40000: loss=7.387e+01
#50000: loss=5.549e+00
#60000: loss=5.538e+00
#70000: loss=5.524e+00
#80000: loss=5.503e+00
#90000: loss=5.484e+00
#0:01:48.755148

and the loss reduced. I also obtained parameters that fit the data but some were negative. Am I using the soft plus incorrectly? I know torch does not do constrained optimization but I would like to get only positive parameters. Is there a way to do this?

#tensor([-13.8513, -10.6297,  -6.4025,  -7.9830,  -7.9595, -33.1631,  -8.8065,
#         -4.4100,  12.1742, 116.4665,  26.3406])

Hi Richard!

If I understand correctly what you are doing, things should be fine.

Roughly speaking, you have p, some “raw” parameters, that you
optimize, that can and do become negative.

Then you have p_pade = softplus (p), the actual parameters of your
Pade approximant, that will be strictly positive because you have passed
them through softplus(). If your best fit occurs when one of the p_pade
is close to zero, then your optimizer will drive the corresponding element
of p negative (while p_pade remains positive).

If this is what is going on, it is logically consistent, and should be okay for
your use case.

Best.

K. Frank

Exactly!!! Your explanation is very correct. I understand clearly now.