Why are the dimensions of input/output in lstm oriented in the way they are?

Hi! I’m a newbie and I’m still learning about pytorch. I noticed that the input and output of lstm in pytorch have their dimensions in a specific order (sequence length, batch size, input size). May I ask why it is defined in this way?

I’m more familiar with the conv modules and there the batch size is usually at the beginning (e.g. batch_size x channel x (D) x (H) x W). So it feels a bit unintuitive to me that the orientation is changed for lstm.

I do realize you can set a batch_first argument when setting nn.lstm, but since the default of that arg is false, I’m just wondering if there is any specific reason behind this orientation of dims.

RNNs use a loop, unrolling the time dimension, so time-major RNNs process contiguous sequential memory blocks. How important that is, depends on implementation, device and tensor size (i.e. whether it fits in cache).