What is the reason of rnn's batch_first parameter is set False as default?

Hello every one. I’m beginner at nlp. i’m learning about how use nn.RNN with pytorch docs.

before start, i’m not good at english grammar XD )

in docs, nn.RNN’s parameter ‘batch_first’ is False in default.

it seems that people use batch_first == False ususally. ( i saw a lot of input datasets are same like that not only at simple nlp model but also at complicated nlp model(e.g bert) )

Can you let me know why people uses dataset’s shape as T x B x * (T is max length) ??

is there any advantages when i set dataset’s shape like that?

Thank you!!

1 Like

There’s no strict advantages or disadvantages. It could be that because, in an RNN, we’re iterating over the sequence dimension (we take timestep-0, then timestep-1, etc.) so it makes “sense” to have that dimension first. But it doesn’t really make a difference.

I would suggest just going with the default option only because it’s default.


Thank you for your helpful answer :smiley:

Hi, I was going through the pytorch’s machine translation example program Language Translation with nn.Transformer and torchtext — PyTorch Tutorials 1.12.0+cu102 documentation seems even here the dataset is T(max_len)xBxDmodel. I am failing to understand how does it works?. here in this example program my understanding is we are not feeding the data sequentially we are basically feeding a batch of sentances as src and tgt. so doesn’t it make more sense if our dataset is in the format of BxTxDmodel instead of TxBxDmodel ? or does the later format has some advantage ? or does something wrong with my understanding ?

need the community’s help. I am just getting started in Transformers and struck in this part for a week.