Fitting the sine curve using pytorch optimizers

Hello

I want to use PyTorch to fit the sine curve.
I have the points of the sine curve with given frequency f_trurh and a sine function with an input parameter f.

However, I am facing an issue where the fit process does not seem to work properly. The Adam optimizer gets stuck in a local minima closest to the initial frequency value f_0.

Could you please help me understand what might be causing this problem?

Below is the code and some figures, might be helpful to understand the problem.

Attachments

the code:

import torch

def sin_function(x, f):
    return torch.sin(f*x)

x_data = torch.linspace(0, 6, 10, dtype=torch.float32)
f_truth = 1
y_data = sin_function(x_data, f_truth)

f_0 = 1.8
f = torch.tensor(f_0, requires_grad=True)

loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.Adam([f], lr=0.2)

num_epochs = 100
for epoch in range(num_epochs):
    y_pred = sin_function(x_data, f)
    loss = loss_fn(y_pred, y_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

You can plot the loss function and see that from f_0 = 1.8 you’ll go in the wrong direction.

E.g.

losses = [(x/50, loss_fn(torch.sin( torch.linspace(0, 6, 20) ), torch.sin( (x/50) * torch.linspace(0, 6, 20) ) )) for x in range(0, 100) ]
plt.plot( [x[0] for x in losses], [x[1] for x in losses] )

shows that f_0 > ~ 1.75 or < about 0.3 is going to learn to move in the wrong direction away from the global optimum.

Looking at your own graph of the input data and predicted curve, think about how close the high and low of the predicted sin curve are from (points on) the actual sin curve, and how much farther they’d (temporarily) need to move away from optimal values before finally arriving at the optimal solution.

Thanks for your answer.

I indeed understand that standard gradient descent would work in this case only if the initial parameter value is close enough to the global minimum. However, I believe that more complicated optimizers should handle the local minima problem.

For instance, there is an example of PyTorch’s built-in Adam optimizer dealing with the local minima problem: (image).

So, I guess, in my particular case, there should also be a way to solve the problem for less strict initial conditions. And the question is how to do it.

In fact, after some empirical experiments, I found that increasing the learning rate to values around 2-3 made it possible to find the global minima.

That’s right, there are cases where an optimizer step can step right past a local minimum. A high learning rate can cause that, and so can a momentum factor. A high lr or momentum can also cause problems – you want to avoid local minima but not the global one. In most situations, you wouldn’t want to choose a learning rate greater than one.