Why the default inputs’ dimensions of LSTM are [sequence_length, batch_size, feature_size]?
I think it is more naturally to use the data in the shape of [batch_size, sequence_length, feature_size]
Before I comment on the principle, if your input_data is of shape [batch_size, sequence_length, feature_size], then input_data.permute(1, 0, 2) will transform it into shape [sequence_length, batch_size, feature_size].
I believe that permute doesn’t copy the data, it just alters the strides used for the underlying array, so it is very efficient.
I am running some analyses on some really long time-series data and I wanted to create sequential batches. I found that if my data was of shape [batch_size, sequence_length, feature_size], then selecting slices of the form [:, start:end, :] gave me non-contiguous tensors and the model couldn’t use them directly. So to avoid having to copy the tensor in order to make it contiguous, I first made sure my data was of shape [sequence_length, batch_size, feature_size] and then it all worked.
I also saw this with nn.MultiheadAttention
. It’s still not clear to me why you would not have the first dimension be the batch size like for nn.Linear
.
The layout is chosen for performance reasons as also said here.
Also, for RNNs you can use batch_first=True
to change the shapes.