How to set the seqeunce lenght in LSTM

Giuseppe · January 5, 2022, 7:10pm

Hi, i try to use a LSTM for regression task (2 target)
My input data is with theese columns(Index, value1, value2, value3, value4, value5 and target1 and target2)

i want to pass at every time step in lstm an array like this:
[0,0,0,0,0] where the number 0 is an array of 5 values (value1, value2, value3, value4, value5) (the first row without target)
at second time steps:
[0,0,0,0,1] where the number 1 is an array of 5 values (value1, value2, value3, value4, value5) (the second row without target)
and so on.
so this is my class dataset:

from torch.utils.data.dataset import Dataset
import numpy as np
import pandas as pd
from os import path
import torch

# CREAZIONE DELLA CLASSE CHE CI PERMETTE DI CARICARE LE FEATURES

class Dataset(Dataset):

    """
    Carica il dataset per l'eseprimento  che ci permette di caricare le feature
    """

    def __init__(self, base_path, csv_Name):

        """Input:
            dataset CSV
        """
        self.base_path = base_path
        self.csv_Name = csv_Name
        path_finale = path.join(base_path, csv_Name)

        self.file = pd.read_csv(path_finale)

    def __getitem__(self, index):
        #print(index)
        names = []
        index1 = index - 4
        index2 = index - 3
        index3 = index - 2
        index4 = index - 1
        index5 = index

        if index1 < 0:
            index1 = 0
        if index2 < 0:
            index2 = 0
        if index3 < 0:
            index3 = 0
        if index4 < 0:
            index4 = 0

        name1, rss11, rss21, rss31, rss41, rss51, coordinateX1, coordinateY1 = self.file.iloc[index1]
        name2, rss12, rss22, rss32, rss42, rss52, coordinateX2, coordinateY2 = self.file.iloc[index2]
        name3, rss13, rss23, rss33, rss43, rss53, coordinateX3, coordinateY3 = self.file.iloc[index3]
        name4, rss14, rss24, rss34, rss44, rss54, coordinateX4, coordinateY4 = self.file.iloc[index4]
        name5, rss15, rss25, rss35, rss45, rss55, coordinateX5, coordinateY5 = self.file.iloc[index5]
        name1 = int(name1)
        name2 = int(name2)
        name3 = int(name3)
        name4 = int(name4)
        name5 = int(name5)
        names.append(name1)
        names.append(name2)
        names.append(name3)
        names.append(name4)
        names.append(name5)

        array1 = torch.tensor([rss11, rss21, rss31, rss41, rss51])
        array2 = torch.tensor([rss12, rss22, rss32, rss42, rss52])
        array3 = torch.tensor([rss13, rss23, rss33, rss43, rss53])
        array4 = torch.tensor([rss14, rss24, rss34, rss44, rss54])
        array5 = torch.tensor([rss15, rss25, rss35, rss45, rss55])
        array = torch.cat((array1, array2, array3, array4, array5))
        array = array.reshape(1,25)
        coordinateX = coordinateX5 
        coordinateY = coordinateY5

        



        return {"ID": names, "Array": array, 'Movement': np.array([coordinateX, coordinateY], dtype='float')}

    def __len__(self):
        return len(self.file)

My LSTM Model is like this:

import torch
import torch.nn as nn
class LSTM_CLASS(nn.Module):
    def __init__(self, num_classes, input_size, hidden_size, num_layers):
        super(LSTM_CLASS, self).__init__()
        self.num_classes = num_classes #2
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size


        self.lstm = nn.LSTM(input_size= self.input_size,
                            hidden_size=self.hidden_size,
                            num_layers= self.num_layers,
                            batch_first=True)

        self.fc1 = nn.Linear(hidden_size, hidden_size)

        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):

        ula, (h_out,_) = self.lstm(x)
        [_, _, features] = h_out.shape
        out = self.fc1(h_out)
        out = out.view(-1, features)
        relu = nn.ReLU()
        out = relu(out)
        out = self.fc2(out)

        return out

and i call the model, create the dataloaders like this:

    model = LSTM.LSTM_CLASS(num_classes = 2, input_size=25, hidden_size=128, num_layers=1)

    train_dataset = Dataset('../Dataset/','121_train_seq.csv')
    valid_dataset = Dataset('../Dataset/','121_valid_seq.csv')


    train_loader = DataLoader(train_dataset, batch_size=16, num_workers=2)
    valid_loader = DataLoader(valid_dataset, batch_size=16, num_workers=2)

i don’t understand with (with suffle = False on dataloader), in get_item od the class dataset i have this situation:
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 1]
[0, 0, 0, 1, 2]
[0, 0, 1, 2, 3]
[0, 1, 2, 3, 4]
[1, 2, 3, 4, 5]
[2, 3, 4, 5, 6]
[3, 4, 5, 6, 7]
[4, 5, 6, 7, 8]
[5, 6, 7, 8, 9]
[6, 7, 8, 9, 10]
[7, 8, 9, 10, 11]
[8, 9, 10, 11, 12]
[9, 10, 11, 12, 13]
[10, 11, 12, 13, 14]
[11, 12, 13, 14, 15]
why from now a skip of 16 (my BS?)
[28, 29, 30, 31, 32]
[29, 30, 31, 32, 33]
[30, 31, 32, 33, 34]
[31, 32, 33, 34, 35]
[32, 33, 34, 35, 36]
[33, 34, 35, 36, 37]
[34, 35, 36, 37, 38]
[35, 36, 37, 38, 39]
[36, 37, 38, 39, 40]
[37, 38, 39, 40, 41]
[38, 39, 40, 41, 42]
[39, 40, 41, 42, 43]
[40, 41, 42, 43, 44]
[41, 42, 43, 44, 45]
[42, 43, 44, 45, 46]
[43, 44, 45, 46, 47]
[60, 61, 62, 63, 64]
[61, 62, 63, 64, 65]
[62, 63, 64, 65, 66]
[63, 64, 65, 66, 67]
[64, 65, 66, 67, 68]
[65, 66, 67, 68, 69]
[66, 67, 68, 69, 70]
[67, 68, 69, 70, 71]
[68, 69, 70, 71, 72]

where do i wrong?

for completeness this is my training procedure:

def train(model, train_loader, valid_loader, exp_name = "LSTM",  lr=0.0001, epochs=1000, wd = 0.000001):
    criterionX = nn.SmoothL1Loss()
    criterionZ = nn.SmoothL1Loss() 

    optimizer = Adam(params=model.parameters(),lr = lr, weight_decay=wd)
    scheduler = StepLR(optimizer, step_size=50, gamma=0.5)#per ogni 100 epochs, lr si divide per due

    # meters
    lossX_meter = AverageValueMeter()
    lossZ_meter = AverageValueMeter()
    lossT_meter = AverageValueMeter()

    # device
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model.to(device)
    loader = {"train": train_loader, "test": valid_loader}

    loss_X_logger = VisdomPlotLogger('line', env=exp_name, opts={'title': 'LossX', 'legend': ['train', 'test']})
    loss_Z_logger = VisdomPlotLogger('line', env=exp_name, opts={'title': 'LossZ', 'legend': ['train', 'test']})
    loss_T_logger = VisdomPlotLogger('line', env=exp_name, opts={'title': 'Total_Loss', 'legend': ['train', 'test']})
    visdom_saver = VisdomSaver(envs=[exp_name])
    for e in range(epochs):
        for mode in ["train", "test"]:
            lossX_meter.reset()
            lossZ_meter.reset()
            lossT_meter.reset()
            model.train() if mode == "train" else model.eval()
            with torch.set_grad_enabled(mode == "train"):  # abilitiamo i gradienti in training
                for i, batch in enumerate(loader[mode]):
                    x = batch["Array"].to(device)
                    dx = batch['Movement'][:, 0].float().to(device)
                    dz = batch['Movement'][:, 1].float().to(device)
                    output= model(x)

                    out1, out2 = output[:, 0], output[:, 1]

                    l1 = criterionX(out1, dx)
                    l2 = criterionZ(out2, dz)
                    loss = l1+l2

                    if mode == "train":
                        optimizer.zero_grad()
                        loss.backward()
                        optimizer.step()
                        
                    n = x.shape[0]  # numero di elementi nel batch

                    lossX_meter.add(l1.item() * n, n)#update meter to ploot
                    lossZ_meter.add(l2.item() * n, n)
                    lossT_meter.add(loss.item()* n, n)
                    if mode == "train":
                        loss_X_logger.log(e + (i + 1) / len(loader[mode]), lossX_meter.value()[0], name=mode)
                        loss_Z_logger.log(e + (i + 1) / len(loader[mode]), lossZ_meter.value()[0], name=mode)
                        loss_T_logger.log(e + (i + 1) / len(loader[mode]), lossT_meter.value()[0], name=mode)
            loss_X_logger.log(e + (i + 1) / len(loader[mode]), lossX_meter.value()[0], name=mode)
            loss_Z_logger.log(e + (i + 1) / len(loader[mode]), lossZ_meter.value()[0], name=mode)
            loss_T_logger.log(e + (i + 1) / len(loader[mode]), lossT_meter.value()[0], name=mode)


        scheduler.step()
        visdom_saver.save()
        torch.save(model.state_dict(), '%s.pth' % exp_name)



    return model

Can you help me?

J_Johnson · January 6, 2022, 12:35am

Have you tried num_workers=1 and does that still give the same issue?

J_Johnson · January 6, 2022, 12:46am

On a side note, is there any reason you’re disposing of the hidden state and cell state during your forward pass? I’m not even sure how you got that forward pass to run, to be honest, if you even got that far. See the example at the bottom here regarding hidden and cell state:

https://pytorch.org/docs/1.9.1/generated/torch.nn.LSTM.html

Giuseppe · January 6, 2022, 2:56pm

In the forward part i pass a BSlenghtfeatures)
I saw all the example for create a C_0 and a HN but i need only the HN and attach a couple of FC.
It works by your suggestion (num_workers=1) so thank you very much

J_Johnson · January 6, 2022, 4:46pm

It may somewhat defeat the purpose of using an LSTM if you are tossing the cell state. Have you checked GRU? Or you may try just using 2 linear layers and see if it gets the same or better results. Since you’re not passing the hidden state anyway, and since you’re not positionally encoding the data.

Giuseppe · January 14, 2022, 5:34pm

I use both.
For my topic i am interest in the hidden state of LSTM.
LSTM such as i implemented, thanks to pytorch, the cell state and the first hidden state are initialized automatically.
As you said, i get better results for every batch.