Why is my neural network having trouble predicting the next value of a sin wave?

Why is my neural network not able to predict the next number for a sin wave?

I don’t know if I need a better loss function or what the issue is. It seems to optimize for about 500 steps and then it just flounders with predictions that look nothing like a wave.

The input to the model is 120. That’s 120 of the previous numbers of the sin wave. The model is asked to predict where the next number will be and I store the sin wave values in a deque. I insert the next sin wave value to the end of its deque.

The targets and the predictions are arrays in a shape of (200,).
Every index in the array below 100 represents a negative number and those above 100 represent a positive number.
All values to the hundredth between -1 and 0 are represented by indexes 0 to 100.
Above 100, every hundredth between 0 to 1 are represented.
In other words, 200 possible values in array form to show the neural network what to target and what the prediction is.

The code (which I had broken up in a Jupyter Notebook)…

%matplotlib inline

import torch
import random
import numpy as np
from collections import deque
from matplotlib import pyplot as plt
from matplotlib.pyplot import figure
from torch import nn
import time
import math

plt.rcParams["figure.figsize"]=(12, 8)  

class Network(nn.Module):
    def __init__(self):
        self.net = nn.Sequential(
            nn.Linear(120, 136),
            nn.Linear(136, 146),
            nn.Linear(146, 156),
            nn.Linear(156, 170),
            nn.Linear(170, 188),
            nn.Linear(188, 200))  

    def forward(self, x):
        return self.net(x)  

online_net = Network().cuda("cuda:2")

optimizer = torch.optim.Adam(online_net.parameters(), lr=1e-5)
device = torch.device("cuda:2" if torch.cuda.is_available() else "cpu")  

sensor_buffer = deque(maxlen=120)
action_buffer = deque(maxlen=120)

[sensor_buffer.append(np.array([random.random() for _ in range(4)]).mean()) for __ in range(121)]
[action_buffer.append(0) for _ in range(121)]  

def format_target(n):
    n = (n * 100)
    l = [-1 for o in range(200)]
    l[int(n)] = 1
    return l  

for steps in range(1400):
    adder += (random.random() * 0.2)
    sensor_sinwave = np.sin(sensor_buffer[-1])

    outers = torch.tensor(np.sin(sensor_buffer), dtype=torch.float32).cuda("cuda:2").T

    outer = online_net.forward(outers)

    prediction = torch.argmax(outer).item()

    n_prediction = (prediction * 0.01) -1


    target = format_target(sensor_sinwave)
    target_t = torch.as_tensor(target, dtype=torch.float32, device=device)

    loss = nn.functional.smooth_l1_loss(outer, target_t)


    if steps % 150 == 0:
        print(loss.item(), n_prediction, sensor_sinwave)

The sin wave I’m trying to predict
enter image description here

My predictions
enter image description here

Hi Alio!

I think I understand what you are doing (but I’m not entirely sure …).

Without commenting specifically on your code or proposed network
architecture, let me say the following:

First, there are clearly better ways to predict the next number in a
sine wave than using a neural network, so this is, in any event, a
learning exercise.

However, unless you are purposely trying to use an atypical approach
(as a learning exercise), it would be much more natural to proceed as

You are trying to predict a single number – the next sine value. So
have the output of your network be a single number. You would
typically do this by having the final Linear layer in your model have
out_features = 1. Similarly, your target should also be a single
number – the actual sine value you are trying to predict. MSELoss
would likely be the most natural loss function to use.

What you are doing instead is, in essence, one-hot encoding which
of 200 values ranging from -1 to 1 is closest to your actual sine value.
(Note, the issue here isn’t really the discreteness – it’s the one-hot
encoding.) You then have your network predict this one-hot-encoded
vector by outputting a predicted vector of length 200.

This has two conceptual problems: First, you are burdening your network
with predicting 200 values when, in point of fact, your problem is to
predict just a single value.

Worse – in my mind – is that you penalize your network equally when
it is almost right as when it is very wrong. Here’s what I mean by this:

Let’s say the correct target value is 0.79. If you predict a single value
of 0.78 (and use MSELoss), you’ll get a very low loss, which is good
because you were almost right. On the other hand, if you predict -0.84,
you’ll get a significantly higher loss, which is also good, because your
prediction was quite wrong. Sure, you’d rather predict exactly 0.79,
and you will get a lower loss (if fact 0.0 for the loss) if you do, but
predicting 0.78 or 0.80 is still very good, and your network will train
much more efficiently if you provide it with this information by penalizing
such almost-right predictions with only a small loss.

Let me now artificially simplify your scheme to make the main point: Let’s
say that your predictions (the length-200 vectors) always consist of 199
-1.0 values and one +1.0 value. The value 0.79 gets encoded in your
target as having element 79 be set to +1.0. So the correct prediction is
the same. But let’s say (in the context of my artificial simplification) that
your network outputs a prediction with element 78 set to +1.0 (and all of
the others equal to -1.0). This corresponds to predicting a value of 0.78,
which is almost right. But, in your scheme, such a prediction is penalized
just as much as a quite-wrong prediction of -0.84.

Now I’ve exaggerated the situation by assuming that your predictions
have one +1.0 value with all of the others equal to -1.0, but, still, this
effect of the prediction having to be exactly correct to be considered
“good” presents a much more difficult learning task for your network.

As an aside, if you do want to pursue your approach of using targets
that are one-hot-encoded, you could think about your prediction task
as a classification problem – each of your 200 discretized values for
the next sine value is a class, and you want to predict that class. If
you choose to do this, you will find that using CrossEntropyLoss will
lead to better training than smooth_l1_loss() (or the conceptually
similar MSELoss). The logarithmic divergence that CrossEntropyLoss
provides for bad predictions does seem to fit well with classification
problems and leads to better training.

Good luck.

K. Frank

You could also look for inspiration in the venerable time sequence prediction example:

Also, there are some discussions about adapting this on the forum.

Best regards