What is the correct LSTM time-series input order?

ruzbehakbar · April 17, 2020, 5:25pm

Assuming we have a Sequence-to-Sequence LSTM model for time-series prediction:

Input time-series: X shaped as (batch_size, seq_length = N, input_dim = 1)
Output time-series: y shaped as (batch_size, seq_length = N, input_dim = 1)

I want to predict time series of y using N-lagged X data. What is the correct order (for preprocessing) of the input data into the LSTM mode. In other words, in what direction are the data fed into LSTM models?

Should the input tensor/array, X, be ordered as:

Option 1:

input slice (X[0,:]) [x(t), x(t-1),…, x(t-1)]
output: [y(t), y(t-1),…, y(t-1)]

e.g. model( [x(t), x(t-1),…, x(t-1)])

or, Option 2:

input slice (X[0,:]) [x(t-N),…, x(t-1), x(t)]
output: [y(t-N),…, y(t-1), y(t)]

e.g., model([x(t-N),…, x(t-1), x(t)])

Basically, for my input data, does the sequence time-step go from (t) to (t-N) left to right, or vise vera?

sharvil · April 17, 2020, 6:06pm

Your sequence should be ordered [x(t-N) ... x(t-1), x(t)]. Keep in mind that the sequence dimension should be first unless you specify batch_first=True on the LSTM layer. Your X and Y variables should be shaped [T,N,C] where T is the sequence length, N, is the batch size, and C is the feature size.

ruzbehakbar · April 17, 2020, 6:11pm

Thanks!

Just to clarify, if input data shape is [T,N,C] then batch_first = False, and if [N,T,C] , then batch_first = True

T = Seq. length.
N = batch size
C = # of features.

sharvil · April 17, 2020, 6:32pm

Yup, that’s right.
…