Understanding LSTM input

Hi Jason,

I am really sorry for answering you with such a long delay, I have been really busy in this period of Christmas’ holidays.
In these days I managed to make some LSTM networks work: not sure if properly… but still, if you need help, or If you figured out how to use LSTMs and would like to compare your work with mine, I will try to answer the questions you asked in the first post.

when pytorch asks for “seq_length” is it asking for how long the window/batch is or how many batches are in the dataset?

For your first question, I would say that seq_length corresponds to the length of each window, not to the number of batches in the dataset.

Looking at this post ( Why 3d input tensors in LSTM? ), what I have understood is that using PyTorch’s DataLoaders you can divide your dataset in batches, and each element of the batch can contain a seq_length window of samples.

Let’s make an example: let’s suppose that your dataset contains 10 samples, and you would like to have 5 batches, with each element corresponding to a sequences of 2 samples each.
Since you would like to have 5 batches, having 10 samples in your dataset, this means that your batch_size will be equal to 2, and with each element of the batch being a window/sequence of two elements, this results in something along the line of:

tensor([[[  1.,  10.,  11.,  15.,   9., 100.],
         [  2.,  11.,  12.,  16.,   9., 100.]],

        [[  2.,  11.,  12.,  16.,   9., 100.],
         [  3.,  12.,  13.,  17.,   9., 100.]]])

therefore, with dimensions (2=batch_size, 2=seq_length, 6=input_size).

In order to do so, you have to customize your Dataset class so that it returns at each __getitem__ call a window of elements (whose total number will corresponds to seq_length), while with the DataLoader you can play with the batch dimension in order to have different number of windows for each batch.
I wrote this little demo using the tensor you provided before so that my example may be clearer to you:

import torch
import torch.utils.data as Data
import torch.nn as nn
import torchvision.transforms as transforms

###   Demo dataset

data_from_csv = [[1, 10, 11, 15, 9, 100],
                 [2, 11, 12, 16, 9, 100],
                 [3, 12, 13, 17, 9, 100],
                 [4, 13, 14, 18, 9, 100],
                 [5, 14, 15, 19, 9, 100],
                 [6, 15, 16, 10, 9, 100],
                 [7, 15, 16, 10, 9, 100],
                 [8, 15, 16, 10, 9, 100],
                 [9, 15, 16, 10, 9, 100],
                 [10, 15, 16, 10, 9, 100]]


###   Demo Dataset class

class DemoDatasetLSTM(Data.Dataset):

    """
        Support class for the loading and batching of sequences of samples

        Args:
            dataset (Tensor): Tensor containing all the samples
            sequence_length (int): length of the analyzed sequence by the LSTM
            transforms (object torchvision.transform): Pytorch's transforms used to process the data
    """

    ##  Constructor
    def __init__(self, dataset, sequence_length=1, transforms=None):
        self.dataset = dataset
        self.seq_length = sequence_length
        self.transforms = transforms

    ##  Override total dataset's length getter
    def __len__(self):
        return self.dataset.__len__()

    ##  Override single items' getter
    def __getitem__(self, idx):
        if idx + self.seq_length > self.__len__():
            if self.transforms is not None:
                item = torch.zeros(self.seq_length, self.dataset[0].__len__())
                item[:self.__len__()-idx] = self.transforms(self.dataset[idx:])
                return item, item
            else:
                item = []
                item[:self.__len__()-idx] = self.dataset[idx:]
                return item, item
        else:
            if self.transforms is not None:
                return self.transforms(self.dataset[idx:idx+self.seq_length]), self.transforms(self.dataset[idx:idx+self.seq_length])
            else:
                return self.dataset[idx:idx+self.seq_length], self.dataset[idx:idx+self.seq_length]


###   Helper for transforming the data from a list to Tensor

def listToTensor(list):
    tensor = torch.empty(list.__len__(), list[0].__len__())
    for i in range(list.__len__()):
        tensor[i, :] = torch.FloatTensor(list[i])
    return tensor

###   Dataloader instantiation

# Parameters
seq_length = 2
batch_size = 2
data_transform = transforms.Lambda(lambda x: listToTensor(x))

dataset = DemoDatasetLSTM(data_from_csv, seq_length, transforms=data_transform)
data_loader = Data.DataLoader(dataset, batch_size, shuffle=False, num_workers=2)

for data in data_loader:
    x, _ = data
    print(x)

As you can see, batch_size works with the total number of samples in your dataset, allowing you to choose how many windows you would like to have, while the single window’s length (seq_length) can be imposed directly customizing the Dataset class.
You may want batches of 5 windows of length 2 as before for instance (batch_size=2, seq_length=2), or directly 10 windows of length 2 (batch_size=1, seq_length=2): it really depends on the specific nature of your problem and of your dataset.

Be careful though: as you can see, it may happen that not all your sequences are of the same length!
In the example I wrote before for instance, the last element of the last batch would be composed by a single element window (since we would have reached the end of the dataset): in this case, you have to provide some rules in order to have sequences all of the same length, since DataLoaders won’t work with batches of elements with different dimensions.
That’s why I wrote those checks on the value of idx + self.seq_length.

when pytorch asks for “batch” is it asking for the whole window of data?
when pytorch asks for “features” is it talking about number of columns of data like time, open, close etc.? is that the same as “input_size”?

I hope that the explanation I gave you before answers to your second question. Each element of the batch can be a window of data, it’s up to you.
For the last one, yes, when PyTorch asks for features it is talking about the number of columns, which corresponds to the “input_size” too.

I hope this comment may be helpful to you (and not too late): let me know what do you think about it, or If I have made any errors, in my example code or in my understanding of the functioning of the LSTMs.
I wish you and your family happy holidays!

18 Likes