Why the default inputs' dimensions of LSTM are [sequence_length, batch_size, feature_size]?

Chance · December 27, 2017, 9:24am

Why the default inputs’ dimensions of LSTM are [sequence_length, batch_size, feature_size]?
I think it is more naturally to use the data in the shape of [batch_size, sequence_length, feature_size]

jpeg729 · December 27, 2017, 11:01am

Before I comment on the principle, if your input_data is of shape [batch_size, sequence_length, feature_size], then input_data.permute(1, 0, 2) will transform it into shape [sequence_length, batch_size, feature_size].

I believe that permute doesn’t copy the data, it just alters the strides used for the underlying array, so it is very efficient.

I am running some analyses on some really long time-series data and I wanted to create sequential batches. I found that if my data was of shape [batch_size, sequence_length, feature_size], then selecting slices of the form [:, start:end, :] gave me non-contiguous tensors and the model couldn’t use them directly. So to avoid having to copy the tensor in order to make it contiguous, I first made sure my data was of shape [sequence_length, batch_size, feature_size] and then it all worked.

jeremysalwen · October 17, 2020, 4:59am

I also saw this with nn.MultiheadAttention. It’s still not clear to me why you would not have the first dimension be the batch size like for nn.Linear.

ptrblck · October 18, 2020, 10:11am

The layout is chosen for performance reasons as also said here.
Also, for RNNs you can use batch_first=True to change the shapes.