Functional approximation fails miserable

I’m just generating a sine wave and trying to approximate it. However, the network literally refuses to work.
I’ve tried changing the activation functions, learning rate, nodes in a layer. But nothing seems to work.
Id really appreciate it if any changes were suggested to me.
Thanks !

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import time
import numpy as np
import random

dataset = []

for i in np.linspace(0,10,400):
    dataset.append([i,10*np.sin(i)])
shuffled = dataset.copy()
random.shuffle(shuffled)    
def criterion(out, label):
    return (label - out)**2

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(1,4)
        self.fc2 = nn.Linear(4,4)
        self.fc3 = nn.Linear(4,1)
        
    def forward(self, x):
        x = torch.tanh(self.fc1(x))
        x = torch.tanh(self.fc2(x))
        x = self.fc3(x)
        return x
err=[]
net = Net()
optimizer = optim.SGD(net.parameters(), lr= 0.00001)

for epoch in range(1600):
    for data in shuffled:
        x = torch.FloatTensor([data[0]])
        y = torch.FloatTensor([data[1]])
        optimizer.zero_grad()
        out = net(x)
        loss = criterion(out,y)
        loss.backward()
        optimizer.step()
        
    print(float(loss.data[0]),epoch)
    #random.shuffle(shuffled)
    err.append(float(loss))
    '''if float(loss.data[0]) < 0.0001:
        break'''
kek = []

#testing out the net
for i in np.linspace(0,10,550):
    val = float(net(torch.FloatTensor([i])))
    kek.append(val)

It is hard for Linear layers to learn periodic functions like Sine and Cosine. RNN would work better in these cases. Here is an example of using LSTM to approximate Sine waves.

Edit: I just noticed that your network also have a very limited amount of parameters(only 4), the example you saw has 20. This is also another important problem. What I mean is that as my example and yours demonstrated, RNN can approximate Sine functions better than FCs: this is also why we don’t use FC for sequence modeling.

But doesnt the universal approximation theorem state that one hidden layer in a feed forward net is enough?
https://github.com/thomberg1/UniversalFunctionApproximation this user seems to be successful. Im really confused with whats going on.

One hidden layer is enough - if you have enough capacity in the net. That means that you usually have to make the net wider or add more layers.

The training of your net has trouble converging due to its small size and also that it is using a batch size of one.

Here are my modifications:

  • I backprop through your entire training data as a single batch on each epoch.
  • I made your two layer net wider.
  • I tried different optimizers - RMSprop seems to work the best.
  • I print the mean validation loss after training
import torch
import torch.nn as nn
import torch.optim as optim
from random import shuffle

class Net2(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(1,50)
        self.fc2 = nn.Linear(50,50)
        self.fc3 = nn.Linear(50,1)
    def forward(self, x):
        x = self.fc1(x).tanh()
        x = self.fc2(x).tanh()
        x = self.fc3(x)
        return x


class Net3(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(1,5)
        self.fc2 = nn.Linear(5,5)
        self.fc3 = nn.Linear(5,5)
        self.fc4 = nn.Linear(5,1)
    def forward(self, x):
        x = self.fc1(x).tanh()
        x = self.fc2(x).tanh()
        x = self.fc3(x).tanh()
        x = self.fc4(x)
        return x


net = Net2()

# optimizer = optim.SGD(net.parameters(), lr=1.0e-3)
# optimizer = optim.Adam(net.parameters(), lr=1.0e-3)
optimizer = optim.RMSprop(net.parameters(), lr=1.0e-2)

dataset = [(x,x.sin().mul(10.0)) for x in torch.linspace(0,10,400)]

for epoch in range(1600):
    shuffle(dataset)
    err=[]
    optimizer.zero_grad()
    for x,y in dataset:
        out = net(x.unsqueeze(0))
        err.append((out - y).pow(2.0))
    loss = torch.stack(err).mean()
    loss.backward()
    optimizer.step()
    print(f"[{epoch:4d}] {loss.data}")


#testing out the net
with torch.no_grad():
    val = []
    for z in torch.linspace(0,10,550):
        out = net(z.unsqueeze(0))
        val.append((out - z.sin().mul(10.0)).pow(2.0))
    loss = torch.stack(val).mean()
    print(f"validation loss = {loss}")

Yeah this seems to work. Thank you so much for the help!