Beginner question about sine estimation

Dear all,

I’m fairly new to PyTorch and neural networks but out of curiosity, I have started experiencing with PyTorch.

This might be a dummy question given my level of understanding but I’m trying (and failing miserably) to use PyTorch for a simple matter of sine wave phase and amplitude estimation. I feed the network with a sine wave over one period only so estimating its amplitude is simply the max of the input value whereas its phase can be derived from the max found and the first value feeded.

I started with the sine example of PyTorch.

What I was hoping was that once the network is trained, it would manage to readily estimate amplitude and phase of a sine wave given a sine wave output.

Here is the code:

import torch
import math
import matplotlib.pyplot as plt

x = torch.linspace(-math.pi, math.pi, 2000)

model = torch.nn.Sequential(
torch.nn.Linear(2000, 15),
torch.nn.Linear(15, 2),
)
loss_fn = torch.nn.MSELoss(reduction=‘sum’)

amp = 2;
phase = 23.14torch.rand(1)
y = amp * torch.sin(x + phase)

plt.ion()
fig = plt.figure()
ax = fig.add_subplot(111)
line1, = ax.plot(x, y)
line2, = ax.plot(x, y, ‘r’)
fig.canvas.draw()
fig.canvas.flush_events()

learning_rate = 1e-3
optimizer = torch.optim.SGD(model.parameters(), lr=5.5e-8, momentum=0.95)
for t in range(200000):

if t%50 == 0:
    amp = 2*torch.rand(1);
    phase = 2*3.14*torch.rand(1)
    y = amp * torch.sin(x + phase)
    line1.set_ydata(y.detach().numpy())

amp_phase = model(y)

y_pred = amp_phase[0] * torch.sin(x + amp_phase[1])

# Compute and print loss.
#loss = loss_fn(y_pred, y)
loss = torch.sum((y_pred - y) * (y_pred - y))
if t % 100 == 99:
    print(t, loss.item())
    print('true',amp,phase)
    print('estimated',amp_phase)
    line2.set_ydata(y_pred.detach().numpy())
    fig.canvas.draw()
    fig.canvas.flush_events()


optimizer.zero_grad()
loss.backward()
optimizer.step()

Is it a correct approach ?

Hi Silk!

I haven’t looked at all of your code, but I do see one clear problem (below).

Your Sequential chains two Linears together without any intervening
non-linear “activation.” This becomes equivalent to a single layer, namely
Linear (2000, 2).

You want something like:

model = torch.nn.Sequential (
    torch.nn.Linear (2000, 15),
    torch.nn.ReLU(),
    torch.nn.Linear (15, 2),
)

(or Tanh or other non-linear activation).

I’m only guessing here, but you probably also want a deeper network,
that is, one with more layers. So you might want three Linears (or maybe
more), again separated non-linear activations.

Best.

K. Frank

Hi Franck,

Thanks for your reply. I will try your solutions.
I still have another “beginner” question regarding neural network in general. It seems that the process behind the network are linear (at lest most of the time) so how can a network learn something that is not directly related to the weighted sum of the input (through one or more layer).

Let say, is it possible to build a network that would do nothing more than find the maximum value of an input vector built from random Gaussian noise ? How would you do that ?

Hi Silk!

This really isn’t true. And if it were true, networks would be much less
broadly useful.

I think it might look like networks are “mostly linear” because we
have these big, multidimensional, fully-connected layers with lots of
parameters, whereas the nonlinearities (usually) occur in small,
“simple,” element-wise nonlinear “activations.” That is, we use
linear layers to mix elements together (by adding them), and only
introduce nonlinearities element-wise.

But the linear layers and the activations work together to produce
much more general nonlinearities.

You might ask why we don’t “mix” elements together by multiplying
them instead of adding them. But in fact we do. Consider the
nonlinearity f (x) = x**2. If we mix a and b by adding them,
a + b, then passing the mixture through the nonlinearity gives
f (a + b) = a**2 + 2 * a * b + b**2. Note the a * b term.
We have mixed a and b together by multiplying them – we just
used a linear layer together with an element-wise nonlinearity to
do so.

One could imagine a neural-network framework where the “mixing”
layers also included cross-products as well as sums. I don’t really
know the history of neural networks, but I imagine that making such
a cross-product layer general would require many more parameters
and that it turns out to be more manageable and efficient to rely on
the linear-layer / element-wise-nonlinearity division of labor to
generate those cross-products (and higher-order terms).

Because the “weighted sums” together with the nonlinear activations
are able to create products of elements (as well as higher-order terms).

Build a network with at least one “hidden layer” (and probably more),
but, of course, with nonlinear activations in between the linear layers.

However, this problem is not a good example. If the goal, to choose
something precise, is to find the index of the largest value in the
input vector (compare to predicting the class label in a multi-class
classification), then you just use an “empty” network that does nothing,
and calculate the argmax() of the “output” vector (which is equal to
the input vector).

We could set up a really artificial problem where we try to forbid the
network somehow from using this obvious information, but doing so
would just confuse the issue.

Best.

K. Frank

Hi Franck,

Thanks a lot for your very clear answers to my question ! You are really helping me out understanding neural network.

Silk