RNN for generating time series

AndreaCogliati · February 2, 2017, 1:29am

I’m trying to modify the world_language_model example to generate a time series. My naive approach was to replace the softmax output with a single linear output layer, and change the loss function to MSELoss. Unfortunately, my network seems to learn to output the current input, instead of predicting the next sample. So when I try to generate a new time series, the network is soon stuck at a fixed point. Any suggestions on how to improve my model? Here’s my code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
Train a LSTM network to generate a time series.
'''

import argparse
import collections
import csv
import math
import pickle
import time
import torch
import torch.nn as nn
from torch.autograd import Variable


def read_arguments():
    parser = argparse.ArgumentParser(description='Train a recurrent network to generate a time series.')
    parser.add_argument('--data', type=str, default='data.txt',
                        help='data file to read (CSV)')
    parser.add_argument('--model', type=str, default='LSTM',
                        help='type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU)')
    parser.add_argument('--nhid', type=int, default=100,
                        help='humber of hidden units per layer')
    parser.add_argument('--nlayers', type=int, default=2,
                        help='number of layers')
    parser.add_argument('--lr', type=float, default=.05,
                        help='initial learning rate')
    parser.add_argument('--clip', type=float, default=5,
                        help='gradient clipping')
    parser.add_argument('--epochs', type=int, default=10,
                        help='upper epoch limit')
    parser.add_argument('--batch-size', type=int, default=10, metavar='N',
                        help='batch size')
    parser.add_argument('--bptt', type=int, default=375,
                        help='sequence length')
    parser.add_argument('--checkpoint-interval', type=int, default=10, metavar='N',
                        help='interval to save intermediate models')
    parser.add_argument('--save', type=str,  default='model',
                        help='path to save the final model')
    args = parser.parse_args()
    return args


class RNNModel(nn.Module):
    """Container module with an encoder, a recurrent module, and a decoder."""

    def __init__(self, rnn_type, nhid, nlayers):
        super(RNNModel, self).__init__()
        self.rnn = getattr(nn, rnn_type)(1, nhid, nlayers)
        self.output = nn.Linear(nhid, 1)

        self.init_weights()

        self.rnn_type = rnn_type
        self.nhid = nhid
        self.nlayers = nlayers

    def init_weights(self):
        initrange = 0.1
        self.output.bias.data.fill_(1.0)
        self.output.weight.data.uniform_(-initrange, initrange)

    def forward(self, input, hidden):
        output_lstm, hidden = self.rnn(input, hidden)
        output = self.output(output_lstm.view(output_lstm.size(0)*output_lstm.size(1), output_lstm.size(2)))
        return output.view(output_lstm.size(0), output_lstm.size(1), output.size(1)), hidden

    def init_hidden(self, bsz):
        weight = next(self.parameters()).data
        if self.rnn_type == 'LSTM':
            return (Variable(weight.new(self.nlayers, bsz, self.nhid).zero_()),
                    Variable(weight.new(self.nlayers, bsz, self.nhid).zero_()))
        else:
            return Variable(weight.new(self.nlayers, bsz, self.nhid).zero_())


def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, str):
            for sub in flatten(el):
                yield sub
        else:
            yield el
        
        
def batchify(data, bsz):
    nbatch = data.size(0) // bsz
    data = data.narrow(0, 0, nbatch * bsz)
    data = data.view(bsz, -1).t().contiguous()
    if torch.cuda.is_available():
        data = data.cuda()
    return data            


def load_data(filename, batch_size):
    '''
    Load a training data sequence from a CSV file
    '''
    with open(filename) as csvfile:
        csvreader = csv.reader(csvfile)
        data = list(csvreader)
        data = torch.Tensor([float(x) for x in flatten(data)])

    train_length = math.ceil(len(data) * .7)
    val_length = math.ceil(len(data) * .2)
    train_data = data[:train_length]
    val_data = data[train_length:train_length+val_length]
    test_data = data[train_length+val_length:]
    return batchify(train_data, batch_size), batchify(val_data, batch_size), batchify(test_data, batch_size)
    
###############################################################################
# Training code
###############################################################################

def clip_gradient(model, clip):
    """Computes a gradient clipping coefficient based on gradient norm."""
    totalnorm = 0
    for p in model.parameters():
        modulenorm = p.grad.data.norm()
        totalnorm += modulenorm ** 2
    totalnorm = math.sqrt(totalnorm)
    return min(1, clip / (totalnorm + 1e-6))


def repackage_hidden(h):
    """Wraps hidden states in new Variables, to detach them from their history."""
    if type(h) == Variable:
        return Variable(h.data)
    else:
        return tuple(repackage_hidden(v) for v in h)


def get_batch(source, i, seq_length, evaluation=False):
    seq_len = min(seq_length, len(source) - 1 - i)
    data = Variable(source[i:i+seq_len].view(seq_len, -1, 1), volatile=evaluation)
    target = Variable(source[i+1:i+1+seq_len].view(-1))
    return data, target


def evaluate(data_source, model, criterion, batch_size, seq_length):
    total_loss = 0
    hidden = model.init_hidden(batch_size)
    for i in range(0, data_source.size(0) - 1, seq_length):
        data, targets = get_batch(data_source, i, seq_length, evaluation=True)
        output, hidden = model(data, hidden)
        total_loss += len(data) * criterion(output, targets).data
        hidden = repackage_hidden(hidden)
    return total_loss[0] / len(data_source)


def train(train_data, model, criterion, lr, batch_size, seq_length, grad_clip):
    total_loss = 0
    hidden = model.init_hidden(batch_size)
    for batch, i in enumerate(range(0, train_data.size(0) - 1, seq_length)):
        data, targets = get_batch(train_data, i, seq_length)
        hidden = repackage_hidden(hidden)
        model.zero_grad()
        output, hidden = model(data, hidden)
        loss = criterion(output, targets)
        loss.backward()

        clipped_lr = lr * clip_gradient(model, grad_clip)
        for p in model.parameters():
            p.data.add_(-clipped_lr, p.grad.data)

        print('.', end='', flush=True)
        total_loss += loss.data
    return total_loss[0] / batch

def save_model(model, name, checkpoint = ''):
    filename = name + str(checkpoint) + '.pt'
    print('Saving model to', filename)
    with open(filename, 'wb') as f:
        torch.save(model, f)
    

def main():
    args = read_arguments()
    print("Loading data...")
    ecg_data, val_data, test_data = load_data(args.data, args.batch_size)
    print("Building network ...")
    ###############################################################################
    # Build the model
    ###############################################################################
    model = RNNModel(args.model, args.nhid, args.nlayers)
    if torch.cuda.is_available():
        print('Using CUDA')
        model.cuda()

    criterion = nn.MSELoss()

    print("Training network ...")
    try:
        lr = args.lr
        ci = args.checkpoint_interval
        filename = args.save
        print('Learning rate {:.5f}'.format(lr))
        prev_loss = None
        for epoch in range(1, args.epochs+1):
            epoch_start_time = time.time()
            train_loss = train(ecg_data, model, criterion, lr, args.batch_size, args.bptt, args.clip)
            val_loss = evaluate(val_data, model, criterion, args.batch_size, args.bptt)
            print()
            print('-' * 89)
            print('| end of epoch {:3d} | time: {:5.2f}s | '
                'train loss {:5.2f} | val loss{:5.2f}'.format(epoch, (time.time() - epoch_start_time),
                                        train_loss, val_loss))
            print('-' * 89)
            if not (epoch % ci):
                save_model(model, filename, epoch)
            if prev_loss and val_loss > prev_loss:
                lr /= 4
                print('New learning rate {:.5f}'.format(lr))
            prev_loss = val_loss
    except KeyboardInterrupt:
        pass
    finally:
        print()
        test_loss = evaluate(test_data, model, criterion, args.batch_size, args.bptt)
        print('=' * 89)
        print('| End of training | test loss {:5.2f} |'.format(
            test_loss))
        print('=' * 89)
        save_model(model, filename)

if __name__ == '__main__':
    main()

apaszke · February 2, 2017, 9:18pm

It’s hard to tell why it doesn’t learn anything. There are a lot of factors that can cause it, and they depend on the data, preprocessing, model, etc. Maybe it’s one of the things I mentioned below, maybe something else.

If the expected output given the input is equal to the input, and the network can’t find any patterns in your data, it’s quite logical that the only thing it can do to optimize the loss is to simply return what it got. Another thing is that if the outputs barely differ from the inputs, the loss value will be very small even if the network returns the input.

drscotthawley · February 6, 2017, 3:42am

It looks like the "target = " line in your get_batch() method is correctly looking one index ahead… Perhaps the error lies not in the training but in the generation of the new time series.
Would you mind sharing with us the code you use for generating?

AndreaCogliati · February 8, 2017, 3:44pm

@drscotthawley here’s the code to generate a sequence from a trained network:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
Sample from an LSTM network to generate a time series.
'''

import argparse
import time
import math

import torch
import torch.nn as nn
from torch.autograd import Variable
from train import RNNModel

def read_arguments():
    parser = argparse.ArgumentParser(description='Sample from a recurrent network to generate a signal.')
    parser.add_argument('--checkpoint', type=str, default='./model.pt',
                        help='model checkpoint to use')
    parser.add_argument('--outf', type=str, default='generated.txt',
                        help='output file for generated signal')    
    parser.add_argument('--length', type=int,  default='12500',
                        help='length of the output signal')
    args = parser.parse_args()
    return args


def main():
    args = read_arguments()
    print("Loading model...")
    with open(args.checkpoint, 'rb') as f:
        model = torch.load(f)
    if torch.cuda.is_available():
        model.cuda()
    else:
        model.cpu()

    print('Sampling...')
    try:
        hidden = model.init_hidden(1)
        output = Variable(torch.zeros(1,1,1), volatile=True)
        # output.data[0] = 0.0
        if torch.cuda.is_available():
            output.data = output.data.cuda()
        with open(args.outf, 'w') as outf:
            for _ in range(args.length):
                output, hidden = model(output, hidden)
                outf.write('{:.5f}\n'.format(output.squeeze().data.cpu()[0]))
    except KeyboardInterrupt:
        pass

if __name__ == '__main__':
    main()

drscotthawley · February 8, 2017, 6:00pm

@AndreaCogliati Thanks for sharing. I am very interested in this topic as well, and so interested in helping to find the solution to your problem.

I took the code that you posted and I found that, for my data, the predicted solution moves rapidly to zero. I trained with a simple sine wave (black) and got the red line when I ran your prediction code…

(this is zoomed in towards the beginning… the training data goes on for 50,000 timesteps)

drscotthawley · February 9, 2017, 6:33am

I notice a related post by Element Research using pure Torch.

Perhaps that would be helpful for those more in-the-know, as far as making the PyTorch implementation match the Torch implementation. I’m not at the stage where I can do that yet.

I don’t see a Sequencer() layer defined in PyTorch though; perhaps it’s not necessary.

AndreaCogliati · February 9, 2017, 2:13pm

Thanks for your interest. Yes, that’s the very same behavior I’m observing, in general. With longer training, and longer backpropagation through time I was able to generate simple sinusoidal signals and even a square wave, but with more complex signals the network always ends up to a fixed point (not necessarily zero). And thanks for listing the post by Element Research. It looks like that my naive model is not completely off-track.

Russel_Russel · February 12, 2017, 7:24am

Can you please explain what is the purpose of your read_argumnets function? What does the argparser do here (I always see it in all pytorch examples? Also, what is an application of generating time series data (I mean how would that be useful)? Please excuse my limited knowledge, as I am still new to deep_learning. Thanks for the help.

AndreaCogliati · February 15, 2017, 4:15pm

argparse is just for parsing command line parameters, so you can change some parameters in the model without changing the source code.

The idea behind generating time series is to prove that a neural network can automatically learn important features in the training data, and generate new data that resembles the given examples. For an introduction to the task, I would suggest reading Andrej Karpathy blog post.

csarofeen · February 16, 2017, 6:22pm

What kind of “more complex signals” are you having trouble with?

AndreaCogliati · March 2, 2017, 4:13pm

For instance a combination of 3 sinusoids.

osm3000 · April 12, 2017, 2:11am

@AndreaCogliati Did you manage to solve this issue? I am facing a similar problem at the moment

AndreaCogliati · April 12, 2017, 1:32pm

No, I didn’t, sorry. However, I haven’t spent much on it lately, since I have been busy on other projects.

drscotthawley · November 7, 2017, 7:57pm

I’m writing to bump this thread.
I gave up on PyTorch a while back, and went back to Keras because of these issues and the lack of actual resolution in the forum on this.
But I’m interested in trying PyTorch again.

I notice that there’s now an official PyTorch/example for Time Sequence Prediction (https://github.com/pytorch/examples/tree/master/time_sequence_prediction), and was happy to try it out, but it just produces the same output as we described above, namely it just dives for a fixed point and stays there (screenshot of predict14.pdf):

Can someone comment on this? It’s odd when even the example code doesn’t work.

tom · November 7, 2017, 8:27pm

I get not perfect but reasonable predictions (the following image is after 25 steps) once the MSE is around 1e-5 to 3e-5. Sometimes the amplitudes vary more wildly, but that might be expected. What is the MSE you get?

Best regards

Thomas

drscotthawley · November 8, 2017, 8:47pm

Tom, thanks for writing back. Wow, that’s great! You seem to be in the minority though: Since posting, I found there’s an Issue on GitHub where people are reporting the behavior I describe.

So, what is it that you’re doing that’s different from the rest of us?

One thing seems to be that you’re running for more steps (25): the example stops at 15. But I don’t think that would make a difference, because the loss seems to “flatten out” fairly early, e.g. by step 3. Output is…

STEP: 0
loss: 0.537086230955
loss: 0.519054139134
loss: 0.334312643402
loss: 0.245408088178
loss: 0.241630530524
loss: 0.236310649411
loss: 0.227821229553
loss: 0.212925355116
loss: 0.18457783161
loss: 0.134251495296
loss: 0.0847459995036
loss: 0.0574250318391
loss: 0.0512263487439
loss: 0.0473003375388
loss: 0.039150461162
loss: 0.0352442710805
loss: 0.0323013329392
loss: 0.0247076271594
loss: 0.0190413202173
loss: 0.0149685327246
test loss: 0.0122880477308
STEP: 1
loss: 0.0123111050907
loss: 0.00789734904096
loss: 0.00671599497278
loss: 0.00353805780334
loss: 0.0032202868875
loss: 0.00275785860848
loss: 0.00255092406484
loss: 0.00235055785575
loss: 0.00218358551007
loss: 0.00186726224563
loss: 0.00158670291558
loss: 0.00253018196455
loss: 0.00124947355056
loss: 0.00129163893022
loss: 0.00115403097926
loss: 0.00113527961892
loss: 0.00111034928823
loss: 0.00110101619577
loss: 0.0010856783608
loss: 0.00107512001811
test loss: 0.00151556788016
STEP: 2
loss: 0.00106017296511
loss: 0.00101216329981
loss: 0.195079458435
loss: 1.49997814199
test loss: 1.50362019943
STEP: 3
loss: 1.49997814199
test loss: 1.50362019943
STEP: 4
loss: 1.49997814199
test loss: 1.50362019943
STEP: 5
loss: 1.49997814199
test loss: 1.50362019943
STEP: 6
loss: 1.49997814199
test loss: 1.50362019943
STEP: 7
loss: 1.49997814199
test loss: 1.50362019943
STEP: 8
loss: 1.49997814199
test loss: 1.50362019943
STEP: 9
loss: 1.49997814199
test loss: 1.50362019943
STEP: 10
loss: 1.49997814199
test loss: 1.50362019943
STEP: 11
loss: 1.49997814199
test loss: 1.50362019943
STEP: 12
loss: 1.49997814199
test loss: 1.50362019943
STEP: 13
loss: 1.49997814199
test loss: 1.50362019943
STEP: 14
loss: 1.49997814199
test loss: 1.50362019943

…and all the predict*.pdf files from predict2.pdf onward look like the graph I posted.

tom · November 8, 2017, 9:17pm

Hello Scott,

I’m not sure I have much of a secret sauce, but here is my notebook. It works with a recent master checkout of pytorch, but I don’t think it was broken before.
[Edit:] Hmhm. Now that I look at it, it has the same error of using c_t instead of h_t as the hidden state. I updated the notebook to do the right thing and added a linear layer.

Best regards

Thomas

drscotthawley · November 8, 2017, 10:56pm

Thanks for sharing your code and for taking the time to help, Thomas! The code in your notebook is a bit different from what’s in the example as it is right now. Apart from being CUDA-enabled (which is great!), in yours, both the “outputs” lines in the forward part of the Sequence class look like this:
outputs += [c_t2]

but in the PyTorch/examples entry, the corresponding lines read
outputs += [h_t2]
…Not sure how significant that is; I’m still learning.

When I run your code, the intermediary images look like things are working, and the loss gets really small (like 3e-5)…

but shortly after that, the Loss begins increasing until the final value is around 0.56 and the final image looks like…

Just for definiteness: I’m running Python 3.5 via the anaconda distribution, with CUDA 8.0 and CUDNN 5.1.10 on Ubuntu 16.04. What are you using?
Thanks again.

tom · November 9, 2017, 5:35am

I updated the notebook and it still works.
The problem is likely that you don’t want a nonlinearity at the end, so I added a linear layer.
This is likely a good solution for the example.

Best regards

Thomas

[Edit:] pull request (even if they should put a picture of the output rather than yours truly next to it)

drscotthawley · November 9, 2017, 7:31pm

Thanks Thomas! That’s it. I confirm that your code yields the pictured result. So glad to be able to move on.

I’m also able to run the example code now that your pull request has been accepted.

I’ll consider my part of this thread as “closed”.

I also added one more pull request for the example, incorporating cuda() definitions as well.