LSTM time sequence generation

osm3000 · April 17, 2017, 2:50am

Hi,

For several days now, I am trying to build a simple sine-wave sequence generation using LSTM, without any glimpse of success so far.

I started from the “time sequence prediction example”
All what I wanted to do differently is:

Use different optimizers (e.g RMSprob) than LBFGS
Try different signals (more sine-wave components)

This is the link to my code. experiment.py is the main file

What I do is:

I generate artificial time-series data (sine waves)
I cut those time-series data into small sequences
The input to my model is a sequence of time 0...T, and the output is a sequence of time 1...T+1

What happens is:

The training and the validation losses goes down smoothly
The test loss is very low
However, when I try to generate arbitrary-length sequences, starting from a seed (a random sequence from the test data), everything goes wrong. The output always flats out

figure_1-3.png800×600 22.2 KB

I simply don’t see what the problem is. I am playing with this for a week now, with no progress in sight.
I would be very grateful for any help.
Thank you

spro · April 17, 2017, 6:20am

The model and training code looks good. I think there’s something weird in your get_batch function or related batching logic - by turning down batch_size it unexpectedly runs slower but gets better results:

Epoch 11 -- train loss = 0.005119644441312117 -- val loss = 0.01055418627721996

osm3000 · April 17, 2017, 7:52pm

Thank you for your reply @spro
What was the batch_size you used to get this result? I reduced it from 32 to 8, but it is still flats out

osm3000 · April 17, 2017, 8:31pm

I rechecked the get_batch function. You were right, there was were two problems with it (the chosen batch_size didn’t propagate to this function + the targets range wasn’t chosen correctly). I modified the github repository.
However, this still doesn’t resolve the problem. I tried with different batch sizes, but it still flats out

Help?

spro · April 18, 2017, 8:24am

Good results after shortening the period of your sine wave to 60 steps (from 180):

It might be that the long range dependency is too long for such a small model. It can learn to “fit the data” when the teacher is holding its hand, but is never trained on its own outputs, so that’s as far as it goes.

osm3000 · April 18, 2017, 8:53am

Thank you so much @spro !
I was able to regenerate your output.
I tried with 2 sine-waves components as well (while reducing the steps as you mentioned), and it works beautifully
It is unstable for 3 sine-wave components, but I think this is due to the issue you mentioned, that I don’t train the model on its outputs.
I will train the model on its outputs and see how it performs

osm3000 · April 18, 2017, 12:03pm

@spro: Would you recommend me any paper/blog/tutorial on how to train the model on its outputs?
I understand the general idea, but I am having doubts about its details

AjayTalati · April 18, 2017, 5:29pm

Maybe take a look at @spro 's seq2seq

http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#training-and-evaluating

Teacher forcing and that kind of stuff is used a lot in seq2seq, I first learnt about it from, Wojciech Zaremba, Ilya Sutskever’s

Learning to Execute

If you’re comfortable with Torch, then I think you’d have a lot of fun playing around with that code, IT’S A CLASSIC

https://github.com/wojciechz/learning_to_execute

spro · April 18, 2017, 6:01pm

A quick and dirty version with your existing code:

half_seq_len = int(seq_len / 2)
output = rnn(data[:, :half_seq_len], half_seq_len)

osm3000 · April 18, 2017, 11:57pm

Thank you so much for your help @spro and @AjayTalati , much appreciated

AjayTalati · April 19, 2017, 6:34pm

Hi Omar @osm3000 and Sean @spro,

on the topic of RNN training, I just wondered if either of you guys had seen an implementation of Professor Forcing,

I’d be interested in doing a PyTorch implementation of this?

osm3000 · April 21, 2017, 6:19am

Hi @AjayTalati : I didn’t find any implementation for the Professor Forcing

ethancaballero · April 24, 2017, 3:08am

@AjayTalati I want professor forcing pytorch implementation too.

tom · July 20, 2017, 5:22pm

Hi,

Thank you for the LSTM threads, I’m learning so much from them!
(This one and the more recent one, but I felt that this was better fitting here.)

A few observations that may or may not be interesting regarding the pytorch example (in particular with (entire) batch:

At least with single precision (on cuda) it seems to me that lower loss apparently does not necessarily mean nicer looking predictions (at ~1e-4), I find both.
I would expect something to be up regarding single precision given that the example is done with doubles…
It seems that after switching from LBFGS to Adam also converges similarly.
I have not been entirely successful using double precision on cuda.

Is that similar to your experiences? What’s the conclusion, in particular for the first point.

Best regards

Thomas