LSTM doesn't train

Floriane_S · June 9, 2021, 6:56am

Hello everybody,

I learned Keras and now i will learn PyTorch, I am a beginner. I tried to use a LSTM (both in keras and PyTorch), and the one of PyTorch doesn’t train. I know approximately how the loss and the accuracy must be with Keras, and here, they doesn’t change during the epoch. So i did the assumption that my PyTorch code is not good. I juste want to use one LSTM layer with 256 filters and one linear layer.
This is my PyTorch code :

gist.github.com

https://gist.github.com/floschreiber/e1a60027606f2f771a05987defd5e955

gistfile1.txt

from multiprocessing import cpu_count
from pathlib import Path
from dataset_redshift import dataset_redshift
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import torch
from torch import nn

This file has been truncated. show original

I want to do some classifications, I have 3 classes, and the data are 1D time series with 17908 values. In the train, there are 14001 time series. The shape of the matrix is 14001x17908.

Thanks for your help

ptrblck · June 9, 2021, 7:26am

Based on your description and this code:

X_valid=np.reshape(X_valid, (X_valid.shape[0], 1,X_valid.shape[1]))
X_train=np.reshape(X_train, (X_train.shape[0],1,X_train.shape[1]))

as well as the usage of batch_first=True in the nn.LSTM module the input data should have a shape of [batch_size=14001, seq_len=1, nb_features=17908].
If that’s correct, note that you would only be using a single time step so that the nn.LSTM module might not be really useful.
Could you explain the approach you’ve used in Keras/TF, i.e. which shapes (in particular sequence lengths) were used there?

PS: unrelated to this issue, but it seems you would like to transform the numpy arrays to tensors here:

X_train, X_valid = [torch.tensor(arr, dtype=torch.float32) for arr in (X_train, X_valid)]
y_train, y_valid = [torch.tensor(arr, dtype=torch.long) for arr in (target_train, target_valid)]

If so, you could use:

X_train = torch.from_numpy(X_train).float()
X_valid = ...
y_train = torch.from_numpy(y_train).long()
y_valid = ...

Floriane_S · June 9, 2021, 7:38am

thanks for your answer !!

The keras approch I used :

gist.github.com

https://gist.github.com/floschreiber/27582a6dad78da200d4d61e126608143

LSTM_redshift

import numpy as np
import matplotlib.pyplot as plt
import collections
from sklearn.model_selection import train_test_split
from keras.layers import Input, Dense, Dropout, Conv1D, MaxPooling1D,Flatten, LSTM
from keras.models import Model, Sequential
from keras import metrics
import keras
import matplotlib.pyplot as plt

This file has been truncated. show original

main

from dataset_redshift import dataset_redshift
from MLP_redshift import MLP_redshift
from CNN1D_redshift import CNN1D_redshift
from LSTM_redshift import LSTM_redshift

#from visu_data import visu_data
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import collections

This file has been truncated. show original

i think i use the same sequence lengths…

thanks !!

thanks for the advice for the tensors transformation

Floriane_S · June 9, 2021, 7:49am

In fact it is not really “time series” but it is probability density function (in log) so it is complicated to have several sequence lengths, i think…

ptrblck · June 9, 2021, 9:18am

Thanks for the update.
Based on the Keras LSTM docs it seems that the input should have the shape:

inputs: A 3D tensor with shape [batch, timesteps, feature].

The linked Keras implementation uses tensors as:

X_test=np.reshape(X_test, (X_test.shape[0],X_test.shape[1],1))
X_train=np.reshape(X_train, (X_train.shape[0],X_train.shape[1],1))	
X_validation=np.reshape(X_validation, (X_validation.shape[0],X_validation.shape[1],1))

so it seems that the feature dimension is set to one and the temporal dimension is large.
If I’m not misunderstanding the Keras docs or the posted Keras code, I guess this would be the main difference between both codes.

Sandipan_Majhi · June 9, 2021, 9:28am

Refer the docs of torch LSTM how the input data should be arranged. It is an excellent piece of documentation. Kind of felt like the input data is not arranged properly in the code. Definitely use the batch_first = True option in the lstm. Makes life a lot easier.

Docs : LSTM_Pytorch