Hi! I’m just curious why it’s made this way - it seems counterintuitive to me, and sometimes leads to conceptual confusion. Is there any idea behind?
The reason is that the cudnn backend defines the parameter ordering in this way
http://docs.nvidia.com/deeplearning/sdk/cudnn-user-guide/index.html#cudnnGetRNNWorkspaceSize
There is a ‘batch_first’ option which you can set to True
so that your input tensor can have its first dimension equal to the batch_size
.
Thank you for the answer!