Understanding LSTM input

Jason_Hinson · December 3, 2018, 1:01am

I am trying to implement an LSTM model to predict the stock price of the next day using a sliding window. I have implemented the code in keras previously and keras LSTM looks for a 3d input of (timesteps, (batch_size, features)). I have read through tutorials and watched videos on pytorch LSTM model and I still can’t understand how to implement it. I am going to make up some stock data to use as example so we can be on the same page. I have a tensor filled with data points incremented by hour (time) eg.

data_from_csv = [[time, open, close, high, low, volume],
                 [1,10,11,15,9,100],
                 [2,11,12,16,9,100],
                 [3,12,13,17,9,100],
                 [4,13,14,18,9,100],
                 [5,14,15,19,9,100],
                 [6,15,16,10,9,100]]

if my window size is 2 then I would take batches like:

[[1,10,11,15,9,100],
 [2,11,12,16,9,100]]

[[2,11,12,16,9,100],
 [3,12,13,17,9,100]]

[[3,12,13,17,9,100],
 [4,13,14,18,9,100]]

etc.
giving me 5 batches of (2,6). in keras that would be an input of (5,(2,6)) where 5 is the samples or number of timesteps. 2 is the batch/window length and 6 is the number of features. I don’t think I fully understand what pytorch expects for each input/parameter.

in the pytorch docs: nn.LSTM the parameters are:

input_size: the number of expected features

In keras that would be [time, open, close, high, low, volume] or an input_size of 6 different data labels per timestep.

hidden_size: number of features in the hidden state

I take this as how many lstm cells are in the hidden layer(s) and how many outputs the first layer will have.

num_layers: number of recurrent layers

I take this as the number of hidden LSTM layers. so if num_layers = 1 then I will have an input, hidden layer, and output layer

There are more parameters listed but then it describes the input as:

Inputs: input,(h_0,c_0)
where input is of shape(seq_len, batch, input_size)

the docs then go on to use the example:

rnn = nn.LSTM(10, 20, 2)
input = torch.randn(5, 3, 10)
h0 = torch.randn(2, 3, 12)
c0 = torch.randn(2, 3, 12)
output, (hn, cn) = rnn(input, (h0, c0))

from what I can tell this is what it would look like with the parameter names substituted into the values:

rnn = nn.LSTM(input_size, hidden_size, num_layers)

when they instantiate the “input” variable, only one number matches so I assume the “10” represents columns in a data set as features. so If I were to use pytorches example with my example data_from_csv above, with 2 hidden layers that accept double the input features, would it look as follows?

rnn = nn.LSTM(6, 12, 2)
input = np.ndarray([[[1,10,11,15,9,100],
                     [2,11,12,16,9,100]]])
h0 = torch.randn(2, 1, 12)
c0 = torch.randn(2, 1, 12)
output, (hn, cn) = rnn(input, (h0, c0))

when pytorch asks for “seq_length” is it asking for how long the window/batch is or how many batches are in the dataset?
when pytorch asks for “batch” is it asking for the whole window of data?
when pytorch asks for “features” is it talking about number of columns of data like time, open, close etc.? is that the same as “input_size”

The more I look at this the less I think I understand it. Let me know if I’m on the right track and please define each term based on my stock example data so I can understand. I almost guarantee I will have some clarifying questions but I cannot find a good tutorial or example that uses multivariate input and any help would be appreciated. Thanks.

Crohn_eng · December 14, 2018, 1:53pm

Hi Jason,

just wanted to let you know that you are not alone: I am encountering your same difficulties in understanding how to use LSTM networks properly.
Moreover, I could not even find the source code for them: reading the docs at https://pytorch.org/docs/stable/_modules/torch/nn/modules/rnn.html#LSTM, the ´forward´ method uses the ´torch._C._VariableFunctions.lstm_cell´ implementation, that I could not find anywhere.
I think it might be some C++ integration among those in https://github.com/pytorch/pytorch/tree/master/torch/csrc, but still, I really don’t know where to look at.
Having at least the source code to inspect could be a starting point.

Jason_Hinson · December 14, 2018, 5:58pm

I think the forward function uses C++ only when it handles tensors. Since the forward function defines the path of the tensors through the nn it handles everything using C++. Besides using C++ for nn computations, everything you need is in python to design the flow through the nn(s) which is in the docs. I got an LSTM network running in pytorch by following the Pytorch tutorial example for sentiment analysis but it took a lot of hacking at it to make it work on anything besides word analysis. It still outputs a prediction for every time step instead of each batch which is different than what I’m used to in keras but now I’m almost certain that it has to do with the forward function but I still can’t find any information for common practices around handling the forward function. One thing that was preventing it from working was making sure the hidden state and cell were grabbed and passed to the lstm correctly.

Crohn_eng · December 26, 2018, 4:09pm

Hi Jason,

I am really sorry for answering you with such a long delay, I have been really busy in this period of Christmas’ holidays.
In these days I managed to make some LSTM networks work: not sure if properly… but still, if you need help, or If you figured out how to use LSTMs and would like to compare your work with mine, I will try to answer the questions you asked in the first post.

when pytorch asks for “seq_length” is it asking for how long the window/batch is or how many batches are in the dataset?

For your first question, I would say that seq_length corresponds to the length of each window, not to the number of batches in the dataset.

Looking at this post ( Why 3d input tensors in LSTM? ), what I have understood is that using PyTorch’s DataLoaders you can divide your dataset in batches, and each element of the batch can contain a seq_length window of samples.

Let’s make an example: let’s suppose that your dataset contains 10 samples, and you would like to have 5 batches, with each element corresponding to a sequences of 2 samples each.
Since you would like to have 5 batches, having 10 samples in your dataset, this means that your batch_size will be equal to 2, and with each element of the batch being a window/sequence of two elements, this results in something along the line of:

tensor([[[  1.,  10.,  11.,  15.,   9., 100.],
         [  2.,  11.,  12.,  16.,   9., 100.]],

        [[  2.,  11.,  12.,  16.,   9., 100.],
         [  3.,  12.,  13.,  17.,   9., 100.]]])

therefore, with dimensions (2=batch_size, 2=seq_length, 6=input_size).

In order to do so, you have to customize your Dataset class so that it returns at each __getitem__ call a window of elements (whose total number will corresponds to seq_length), while with the DataLoader you can play with the batch dimension in order to have different number of windows for each batch.
I wrote this little demo using the tensor you provided before so that my example may be clearer to you:

import torch
import torch.utils.data as Data
import torch.nn as nn
import torchvision.transforms as transforms

###   Demo dataset

data_from_csv = [[1, 10, 11, 15, 9, 100],
                 [2, 11, 12, 16, 9, 100],
                 [3, 12, 13, 17, 9, 100],
                 [4, 13, 14, 18, 9, 100],
                 [5, 14, 15, 19, 9, 100],
                 [6, 15, 16, 10, 9, 100],
                 [7, 15, 16, 10, 9, 100],
                 [8, 15, 16, 10, 9, 100],
                 [9, 15, 16, 10, 9, 100],
                 [10, 15, 16, 10, 9, 100]]


###   Demo Dataset class

class DemoDatasetLSTM(Data.Dataset):

    """
        Support class for the loading and batching of sequences of samples

        Args:
            dataset (Tensor): Tensor containing all the samples
            sequence_length (int): length of the analyzed sequence by the LSTM
            transforms (object torchvision.transform): Pytorch's transforms used to process the data
    """

    ##  Constructor
    def __init__(self, dataset, sequence_length=1, transforms=None):
        self.dataset = dataset
        self.seq_length = sequence_length
        self.transforms = transforms

    ##  Override total dataset's length getter
    def __len__(self):
        return self.dataset.__len__()

    ##  Override single items' getter
    def __getitem__(self, idx):
        if idx + self.seq_length > self.__len__():
            if self.transforms is not None:
                item = torch.zeros(self.seq_length, self.dataset[0].__len__())
                item[:self.__len__()-idx] = self.transforms(self.dataset[idx:])
                return item, item
            else:
                item = []
                item[:self.__len__()-idx] = self.dataset[idx:]
                return item, item
        else:
            if self.transforms is not None:
                return self.transforms(self.dataset[idx:idx+self.seq_length]), self.transforms(self.dataset[idx:idx+self.seq_length])
            else:
                return self.dataset[idx:idx+self.seq_length], self.dataset[idx:idx+self.seq_length]


###   Helper for transforming the data from a list to Tensor

def listToTensor(list):
    tensor = torch.empty(list.__len__(), list[0].__len__())
    for i in range(list.__len__()):
        tensor[i, :] = torch.FloatTensor(list[i])
    return tensor

###   Dataloader instantiation

# Parameters
seq_length = 2
batch_size = 2
data_transform = transforms.Lambda(lambda x: listToTensor(x))

dataset = DemoDatasetLSTM(data_from_csv, seq_length, transforms=data_transform)
data_loader = Data.DataLoader(dataset, batch_size, shuffle=False, num_workers=2)

for data in data_loader:
    x, _ = data
    print(x)

As you can see, batch_size works with the total number of samples in your dataset, allowing you to choose how many windows you would like to have, while the single window’s length (seq_length) can be imposed directly customizing the Dataset class.
You may want batches of 5 windows of length 2 as before for instance (batch_size=2, seq_length=2), or directly 10 windows of length 2 (batch_size=1, seq_length=2): it really depends on the specific nature of your problem and of your dataset.

Be careful though: as you can see, it may happen that not all your sequences are of the same length!
In the example I wrote before for instance, the last element of the last batch would be composed by a single element window (since we would have reached the end of the dataset): in this case, you have to provide some rules in order to have sequences all of the same length, since DataLoaders won’t work with batches of elements with different dimensions.
That’s why I wrote those checks on the value of idx + self.seq_length.

when pytorch asks for “batch” is it asking for the whole window of data?
when pytorch asks for “features” is it talking about number of columns of data like time, open, close etc.? is that the same as “input_size”?

I hope that the explanation I gave you before answers to your second question. Each element of the batch can be a window of data, it’s up to you.
For the last one, yes, when PyTorch asks for features it is talking about the number of columns, which corresponds to the “input_size” too.

I hope this comment may be helpful to you (and not too late): let me know what do you think about it, or If I have made any errors, in my example code or in my understanding of the functioning of the LSTMs.
I wish you and your family happy holidays!

Chris_Oosthuizen · May 18, 2019, 6:32pm

This is very helpful. Thank you.

I note that both x and y are returning the same item. Shouldn’t y be the next item in the series to train LSTM.
x

tensor([[[  1.,  10.,  11.,  15.,   9., 100.],
         [  2.,  11.,  12.,  16.,   9., 100.]]])

y

tensor([[[  3.,  12.,  13.,  17.,   9., 100.]]])

I’m struggling to get guidance on what the shape of target should be.

I’ve trying to amend this code to return x and y but getting errors everywhere.

Crohn_eng · May 19, 2019, 10:14am

Hey @Chris_Oosthuizen,

thanks for your feedback! I’m glad my answer helped you!

I note that both x and y are returning the same item.

You are right! To be honest, when I wrote this code my goal was just to give an example on how to prepare the input for a LSTM network properly. Moreover, at the time I was working with recurrent autoencoders, so what I wanted was not to predict the “future”, but to better represent the “present” using information from the “past”. For this reason x and y are returning the same item!

Shouldn’t y be the next item in the series to train LSTM.

I’m struggling to get guidance on what the shape of target should be.

If I have understood what you are trying to implement, I suggest you to look at this repository, and at this post (LSTM time sequence generation).
I hope it is close to what you are looking for!

I’ve trying to amend this code to return x and y but getting errors everywhere.

This is not really my area of expertise. However, if you have still troubles after having read the discussion here, try to post your code and your errors on this thread or on a separate topic on the forum, and we can try to fix them!

Cheers

billtubbs · December 8, 2019, 9:27pm

This code and description is very helpful thanks. One question. What’s the logic for having the listToTensor transformer that generates tensors from the data during training? Just because the source data in this case is a list of lists or a more fundamental reason? Also, wouldn’t torch.FloatTensor(data_as_a_list) work the same?

Crohn_eng · December 9, 2019, 10:53am

Hey @billtubbs,

thanks for the feedback, I really appreciate it!

What’s the logic for having the listToTensor transformer that generates tensors from the data during training? Just because the source data in this case is a list of lists or a more fundamental reason?

Yes, the only reason is to convert the list of lists in Tensor, so

Also, wouldn’t torch.FloatTensor(data_as_a_list) work the same?

would work exactly in the same way and it is surely a smarter and faster way for doing the conversion. Thank you for the suggestion

Cheers!

Wesley_Neill · July 15, 2020, 4:36pm

This is a great question, one that I am struggling with too.

I see maybe you figured some things out for yourself. I wish that someone with some definitive knowledge had answered it for you. How sure are you that your approach is now correct?

I’m afraid that the LSTM model in pytorch has been very hard for me to wrap my head around compared to other CNNs.

aatrey · August 12, 2020, 9:08pm

Thank you, this was very helpful! I was wondering if there is a way to remove the last sequences once we hit the idx + self.seq_length condition instead of appending 0s as it does now.

Crohn_eng · August 15, 2020, 2:35pm

Hey @aatrey,

Thanks for the feedback!

I was wondering if there is a way to remove the last sequences once we hit the idx + self.seq_length condition instead of appending 0s as it does now.

I think there are several ways for solving this. First two that come to my mind:

use Torch pad_packed_sequence for generating batches of sequences of the same length starting from sequences of variable length, and Torch pack_padded_sequence for recovering the sequences in their original form (you can think of these operations as the inverse of each other);
since the root problem is that standard DataLoaders do not handle batches of variable size, write your own DataLoader with a custom collate_fn that handles variable size inputs (take a look here).

Hope these suggestions can be a good start for your troubleshooting, sorry if my answer couldn’t be more detailed but it has been quite a while since the last time I’ve worked with LSTM in PyTorch, and I don’t want to give you any bad advice.

Cheers

hpf · August 27, 2021, 3:42am

Hi,Jason_Hinson,Thank you, finally, someone make complaints about the ambiguity of pytorch’s meaning in 【seq_len,batch_size,input_size】. I don’t understand why pytorch interprets parameters seq_len and batch_size so badly that it gives people ambiguity