Rookie: Not getting the LSTM sine time sequence example to work

vegardde · April 23, 2018, 2:40pm

Hi I am new to PyTorch. However, this exercise should be fairly easy. I am just trying to run the train.py script in the time sequence prediction script here: https://github.com/pytorch/examples/tree/master/time_sequence_prediction

The script crashes after 2 iteration (1 iteration is fine) on the line:

loss = criterion(pred[:, :-future], test_target)

Anyone else having this problem?

Im running python 3.6.5 on windows, and pytorch installed via
conda install -c peterjc123 pytorch-cpu

peterjc123 · April 23, 2018, 2:47pm

I cannot reproduce this issue. What is your version of Numpy and MKL? Some users reported that they resolved this by updating MKL. Releated issue here.

vegardde · April 24, 2018, 7:33am

I tried to reinstall MKL and numpy, but crash still persists. I have numpy 1.14.2, and below is my numpy configuration:

mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/bc9467/AppData/Local/Continuum/miniconda2/envs/py36_64\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Users/bc9467/AppData/Local/Continuum/miniconda2/envs/py36_64\\Library\\include']
blas_mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/bc9467/AppData/Local/Continuum/miniconda2/envs/py36_64\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Users/bc9467/AppData/Local/Continuum/miniconda2/envs/py36_64\\Library\\include']
blas_opt_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/bc9467/AppData/Local/Continuum/miniconda2/envs/py36_64\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Users/bc9467/AppData/Local/Continuum/miniconda2/envs/py36_64\\Library\\include']
lapack_mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/bc9467/AppData/Local/Continuum/miniconda2/envs/py36_64\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Users/bc9467/AppData/Local/Continuum/miniconda2/envs/py36_64\\Library\\include']
lapack_opt_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/bc9467/AppData/Local/Continuum/miniconda2/envs/py36_64\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Users/bc9467/AppData/Local/Continuum/miniconda2/envs/py36_64\\Library\\include']

peterjc123 · April 24, 2018, 9:51am

Why does it include the files from both the Intel version and the conda version?
Could you remove MKL and Numpy completely and then install them through conda using the commands below.
conda install mkl numpy

vegardde · April 24, 2018, 1:51pm

I removed and reinstalled. I also manually edited the numpy config file, to only include the conda version. (The intel directory did not exist anyhow) However, that didnt fix the the issue. The script still crashes when calculating the loss in the second iteration:

loss = criterion(pred[:, :-future], test_target)

peterjc123 · April 24, 2018, 2:06pm

Sorry, I cannot fix it if I cannot reproduce it. Would other example scripts crash too?

vegardde · April 24, 2018, 2:18pm

No worries. I try to reinstall the pythonenvironment from scratch again, to see if that resolves the issue. But can this be some kind of accuracy issue? I.e. that the error gets too small

peterjc123 · April 24, 2018, 2:23pm

No, it wouldn’t be the issue. The loss during the second epoch should be a number of about 0.1 - 1.0.

vegardde · April 24, 2018, 2:28pm

STEP: 1
loss: 0.0004913884075178446
loss: 0.0004082079433761155
loss: 0.0003697542508279457
loss: 0.00035965182196047695
loss: 0.00035303319998621654
loss: 0.00032955205548351274
loss: 0.00029846660034503773
loss: 0.000255572306385164
loss: 0.00021114578472401712
loss: 0.00017208486384337886
loss: 0.00014739358116626456
loss: 0.00013127051505061504
loss: 0.00012368827007625956
loss: 0.00012022047018228365
loss: 0.00011957470975385647
loss: 0.00011948205638343821
loss: 0.0001193616788291415
loss: 0.0001190189211443516
loss: 0.0001183731790543957
loss: 0.00011652698350637044

peterjc123 · April 24, 2018, 2:42pm

Sorry for wrong remembering. But I really think it should not be an issue.

vegardde · April 25, 2018, 1:56pm

So I have had no progress here. Still crash after reinstalling python. Is there any way I can get some more information about the crash from logs or something?

peterjc123 · April 26, 2018, 5:15am

Would you please try updating the pytorch package to 0.4.0?

vegardde · May 7, 2018, 2:06pm

I uninstalled all python versions on my computer and reinstalled a clean version of python 3.6. Then installed pytorch 0.4.0. However, the problem still persist.

vegardde · May 7, 2018, 2:37pm

I got it to work now, by getting the latest code from the git repo. But I am still a bit puzzled about why. The first code below fails in iteration 2 on loss = criterion(pred[:, :-future], test_target)

from __future__ import print_function
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.optim as optim
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

class Sequence(nn.Module):
    def __init__(self):
        super(Sequence, self).__init__()
        self.lstm1 = nn.LSTMCell(1, 51)
        self.lstm2 = nn.LSTMCell(51, 51)
        self.linear = nn.Linear(51, 1)

    def forward(self, input, future = 0):
        outputs = []
        h_t = Variable(torch.zeros(input.size(0), 51).double(), requires_grad=False)
        c_t = Variable(torch.zeros(input.size(0), 51).double(), requires_grad=False)
        h_t2 = Variable(torch.zeros(input.size(0), 51).double(), requires_grad=False)
        c_t2 = Variable(torch.zeros(input.size(0), 51).double(), requires_grad=False)

        for i, input_t in enumerate(input.chunk(input.size(1), dim=1)):
            h_t, c_t = self.lstm1(input_t, (h_t, c_t))
            h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2))
            output = self.linear(h_t2)
            outputs += [output]
        for i in range(future):# if we should predict the future
            h_t, c_t = self.lstm1(output, (h_t, c_t))
            h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2))
            output = self.linear(h_t2)
            outputs += [output]
        outputs = torch.stack(outputs, 1).squeeze(2)
        return outputs



if __name__ == '__main__':
    # set random seed to 0
    np.random.seed(0)
    torch.manual_seed(0)
    # load data and make training set
    data = torch.load('traindata.pt')
    input = Variable(torch.from_numpy(data[3:, :-1]), requires_grad=False)
    target = Variable(torch.from_numpy(data[3:, 1:]), requires_grad=False)
    test_input = Variable(torch.from_numpy(data[:3, :-1]), requires_grad=False)
    test_target = Variable(torch.from_numpy(data[:3, 1:]), requires_grad=False)
    # build the model
    seq = Sequence()
    seq.double()
    criterion = nn.SmoothL1Loss()
    # use LBFGS as optimizer since we can load the whole data to train
    optimizer = optim.LBFGS(seq.parameters(), lr=0.8)
    #begin to train
    for i in range(15):
        print('STEP: ', i)
        def closure():
            optimizer.zero_grad()
            out = seq(input)
            loss = criterion(out, target)
            print('loss:', loss.data.numpy())
            loss.backward()
            return loss
        optimizer.step(closure)
        # begin to predict
        future = 1000
        pred = seq(test_input, future = future)
        print('test_target',test_target.data.numpy())
        print('future:',future)
        print('pred:',pred.data.numpy())
        loss = criterion(pred[:, :-future], test_target)
        print('test loss:', loss.data.numpy())
        y = pred.data.numpy()
        # draw the result
        plt.figure(figsize=(30,10))
        plt.title('Predict future values for time sequences\n(Dashlines are predicted values)', fontsize=30)
        plt.xlabel('x', fontsize=20)
        plt.ylabel('y', fontsize=20)
        plt.xticks(fontsize=20)
        plt.yticks(fontsize=20)
        def draw(yi, color):
            plt.plot(np.arange(input.size(1)), yi[:input.size(1)], color, linewidth = 2.0)
            plt.plot(np.arange(input.size(1), input.size(1) + future), yi[input.size(1):], color + ':', linewidth = 2.0)
        draw(y[0], 'r')
        draw(y[1], 'g')
        draw(y[2], 'b')
        plt.savefig('predict%d.pdf'%i)
        plt.close()

However, the latest code from the github repo, runs without issues:

from __future__ import print_function
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

class Sequence(nn.Module):
    def __init__(self):
        super(Sequence, self).__init__()
        self.lstm1 = nn.LSTMCell(1, 51)
        self.lstm2 = nn.LSTMCell(51, 51)
        self.linear = nn.Linear(51, 1)

    def forward(self, input, future = 0):
        outputs = []
        h_t = torch.zeros(input.size(0), 51, dtype=torch.double)
        c_t = torch.zeros(input.size(0), 51, dtype=torch.double)
        h_t2 = torch.zeros(input.size(0), 51, dtype=torch.double)
        c_t2 = torch.zeros(input.size(0), 51, dtype=torch.double)

        for i, input_t in enumerate(input.chunk(input.size(1), dim=1)):
            h_t, c_t = self.lstm1(input_t, (h_t, c_t))
            h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2))
            output = self.linear(h_t2)
            outputs += [output]
        for i in range(future):# if we should predict the future
            h_t, c_t = self.lstm1(output, (h_t, c_t))
            h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2))
            output = self.linear(h_t2)
            outputs += [output]
        outputs = torch.stack(outputs, 1).squeeze(2)
        return outputs


if __name__ == '__main__':
    # set random seed to 0
    np.random.seed(0)
    torch.manual_seed(0)
    # load data and make training set
    data = torch.load('traindata.pt')
    input = torch.from_numpy(data[3:, :-1])
    target = torch.from_numpy(data[3:, 1:])
    test_input = torch.from_numpy(data[:3, :-1])
    test_target = torch.from_numpy(data[:3, 1:])
    # build the model
    seq = Sequence()
    seq.double()
    criterion = nn.MSELoss()
    # use LBFGS as optimizer since we can load the whole data to train
    optimizer = optim.LBFGS(seq.parameters(), lr=0.8)
    #begin to train
    for i in range(15):
        print('STEP: ', i)
        def closure():
            optimizer.zero_grad()
            out = seq(input)
            loss = criterion(out, target)
            print('loss:', loss.item())
            loss.backward()
            return loss
        optimizer.step(closure)
        # begin to predict, no need to track gradient here
        with torch.no_grad():
            future = 1000
            pred = seq(test_input, future=future)
            loss = criterion(pred[:, :-future], test_target)
            print('test loss:', loss.item())
            y = pred.detach().numpy()
        # draw the result
        plt.figure(figsize=(30,10))
        plt.title('Predict future values for time sequences\n(Dashlines are predicted values)', fontsize=30)
        plt.xlabel('x', fontsize=20)
        plt.ylabel('y', fontsize=20)
        plt.xticks(fontsize=20)
        plt.yticks(fontsize=20)
        def draw(yi, color):
            plt.plot(np.arange(input.size(1)), yi[:input.size(1)], color, linewidth = 2.0)
            plt.plot(np.arange(input.size(1), input.size(1) + future), yi[input.size(1):], color + ':', linewidth = 2.0)
        draw(y[0], 'r')
        draw(y[1], 'g')
        draw(y[2], 'b')
        plt.savefig('predict%d.pdf'%i)
        plt.close()

There are some slight differences, but I am not proficient enough in pyTorch to understand why they should make a difference.

peterjc123 · May 7, 2018, 6:31pm

You are missing torch.no_grad(), see details in the migration guide.