Why batch_size comes after seq_length?

Hi! I’m just curious why it’s made this way - it seems counterintuitive to me, and sometimes leads to conceptual confusion. Is there any idea behind?

The reason is that the cudnn backend defines the parameter ordering in this way
http://docs.nvidia.com/deeplearning/sdk/cudnn-user-guide/index.html#cudnnGetRNNWorkspaceSize

There is a ‘batch_first’ option which you can set to True so that your input tensor can have its first dimension equal to the batch_size.

Thank you for the answer!