Why batch_size comes after seq_length?

Hi! I’m just curious why it’s made this way - it seems counterintuitive to me, and sometimes leads to conceptual confusion. Is there any idea behind?

The reason is that the cudnn backend defines the parameter ordering in this way

There is a ‘batch_first’ option which you can set to True so that your input tensor can have its first dimension equal to the batch_size.

Thank you for the answer!