Hi! I’m a newbie and I’m still learning about pytorch. I noticed that the input and output of lstm in pytorch have their dimensions in a specific order (sequence length, batch size, input size). May I ask why it is defined in this way?
I’m more familiar with the conv modules and there the batch size is usually at the beginning (e.g. batch_size x channel x (D) x (H) x W). So it feels a bit unintuitive to me that the orientation is changed for lstm.
I do realize you can set a batch_first
argument when setting nn.lstm, but since the default of that arg is false, I’m just wondering if there is any specific reason behind this orientation of dims.