What is the reason of rnn's batch_first parameter is set False as default?

Hello every one. I’m beginner at nlp. i’m learning about how use nn.RNN with pytorch docs.

in docs, nn.RNN’s parameter ‘batch_first’ is False in default.

it seems that people use batch_first == False ususally. ( i saw a lot of input datasets are same like that not only at simple nlp model but also at complicated nlp model(e.g bert) )

Can you let me know why people uses dataset’s shape as T x B x * (T is max length) ??

is there any advantages when i set dataset’s shape like that?

There’s no strict advantages or disadvantages. It could be that because, in an RNN, we’re iterating over the sequence dimension (we take timestep-0, then timestep-1, etc.) so it makes “sense” to have that dimension first. But it doesn’t really make a difference.

I would suggest just going with the default option only because it’s default.


Hi, I was going through the pytorch’s machine translation example program Language Translation with nn.Transformer and torchtext — PyTorch Tutorials 1.12.0+cu102 documentation seems even here the dataset is T(max_len)xBxDmodel. I am failing to understand how does it works?. here in this example program my understanding is we are not feeding the data sequentially we are basically feeding a batch of sentances as src and tgt. so doesn’t it make more sense if our dataset is in the format of BxTxDmodel instead of TxBxDmodel ? or does the later format has some advantage ? or does something wrong with my understanding ?

need the community’s help. I am just getting started in Transformers and struck in this part for a week.