Neural Network cannot approximate sin function

Cero · March 31, 2019, 8:29am

I’m trying to learn how to use Pytorch and wanted to start with a really simple test.
So I’ve created a function f(x) = np.sin(x * 50) * 500 + x * 2000
and I want a Feedforward Neural Network to approximate this in the range [0,1].

I’ve created 5000 equally distributed samples and just trained the network on all samples. I do not care if it overfits, its just for testing.
From my tests it seems like my network approximates just a linear function or the mean of my samples…
I’ve tried to play with the amount of layers but even a network with 3 layers with around 100 neurons wasn’t able to find a good solution.

So I’m pretty sure I’m using Pytorch incorrectly but I cannot find my error.
Below you can find my SourceCode, I think the most important part are the two for loops which contain the training procedure.

Code

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

# the target function
def func(x):
    return np.sin(x * 50) * 500 + x * 2000

# plot the target function
x_steps = np.linspace(0, 1, 5000)
y_steps = func(x_steps)
plt.plot(x_steps, y_steps)

# Create the model, optimizer and the loss function
model = nn.Sequential(
                    nn.Linear(1, 64),
                    nn.ReLU(),
                    nn.Linear(64, 32),
                    nn.ReLU(),
                    nn.Linear(32, 1))
optimizer = optim.Adam(model.parameters(), 0.1)
loss_func = nn.MSELoss()

# Create random index "batches"
indexes = np.array(list(range(len(x_steps))), dtype=np.int)
np.random.shuffle(indexes)
batches = torch.split(torch.from_numpy(indexes), 64)

# Training
for epoch in range(100):
    for batch in batches:
        x_batch = x_steps[batch]
        y_batch = y_steps[batch]
        
        prediction = model(torch.Tensor(x_batch.reshape((-1,1))))
        loss = loss_func(prediction, torch.Tensor(y_batch.reshape((-1,1))))
    
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        print(loss.detach().numpy())

# Test if the function was approximated
predictions = model(torch.Tensor(np.reshape(x_steps, (-1,1))))
plt.plot(x_steps, y_steps)
plt.show()
plt.plot(x_steps, predictions.detach().numpy())
plt.show()

Tony-Y · March 31, 2019, 9:25am

My PyTorch example could help you.

justusschock · March 31, 2019, 10:14am

How long did you train the network? Maybe the L1 loss would be a better oder in this case?

Cero · March 31, 2019, 10:53am

I’m currently training the network with a training set of 5000 samples and a batch size of 64 for around 200-500 epochs.
Additionally the training data has no noise.

I thought it would be simple to approximate the data. I’ve tried it with L1Loss and MSELoss also I switched between SGD and Adam but all of them either converge to an linear function or the mean.
Even when I increase the training data and epochs.

@Tony-Y I appreciate your example but it seems like you additionally use the derivative?
After changing the target function to x^2 my code seems to work fine. Even some other functions are perfectly approximated. Does that mean that neural networks are not able to easily approximate periodic functions?

justusschock · March 31, 2019, 11:11am

This should usually be sufficient.

Theoretically, they should be able to approximate any function according to the universal approximation theorem. If I have some time later this day, I’ll try myself to get it working.

Tony-Y · March 31, 2019, 11:33am

Why my example uses the derivative is because I considered a problem with force data in addition to potential data. My example uses only 100 sample points because of the use of both potential and force data. If you use 5 layers and 500 hidden units for your model like my code, and train it for several thousand steps (20,000 steps for my code), can your modified model learn the sine function?

Cero · April 1, 2019, 9:37am

After some testing I’ve realized that my learning rate was always too small.
When setting the learning rate to 0.1 the AdamOptimizer was able to approximate a good function.

The small learning rate was propably responsible for the optimizer getting stuck in an local optima and from which he could not escape.