Multivariate data for LSTM training on PyTorch

I am working with a set of data for training a deep learning LSTM model in PyTorch. I have written a working model with a single variable as input but I was wondering what the convention was for a multi-dimensional input tensor. I already know the LSTM module in PyTorch accepts data of the form (batch_size, sequence_length, input_size) however I’d like to use training data of the form

Date x1 x2 x3
date1 data11 data12 data13
date2 data21 data 22 data 23
etc etc etc etc

I am using a moving window method to get sequences and they are all stored in a large tensor of the form (5,36070,10,1). So 5 lists (corresponding to the number of variables) of 36070 elements each with length of 10 elements. Due to the nature of the input of the LSTM module my initial approach was to reshape the large tensor to (180350,10,1), so the 5 lists of 306070 have been stacked. Is this the correct approach? Or can the input of the LSTM module accept tensor shapes instead of a scalar representing the size of a unidimensional vector?

the logic is the same as with a simple Linear layer - different data should be fed through different vector positions, with vectors occupying the last dimension.

so, with 5 variables you need to permute the dataset to (*,5).

if you’re just unrolling univariate windows, your “inflated dataset” approach may work. there are some potential issues with mini-batch random sampling, but I think this won’t play a big role with your dimension sizes.

Hi, what do you mean by “different data should be fed through different vector positions, with vectors occupying the last dimension.” what should the resulting shape be of the input vector be then?

if your data is tabular, with 5 explanatory factors, input shape can be either (batch,time,5) or (time,batch,5). if you’re unrolling values from a single column dataset, ignore this, last dimension size 1 is correct.