What is the reason of rnn's batch_first parameter is set False as default?

111335 · August 27, 2020, 5:03pm

Hello every one. I’m beginner at nlp. i’m learning about how use nn.RNN with pytorch docs.

before start, i’m not good at english grammar XD )

in docs, nn.RNN’s parameter ‘batch_first’ is False in default.

it seems that people use batch_first == False ususally. ( i saw a lot of input datasets are same like that not only at simple nlp model but also at complicated nlp model(e.g bert) )

Can you let me know why people uses dataset’s shape as T x B x * (T is max length) ??

is there any advantages when i set dataset’s shape like that?

Thank you!!

ayalaa2 · August 27, 2020, 5:37pm

There’s no strict advantages or disadvantages. It could be that because, in an RNN, we’re iterating over the sequence dimension (we take timestep-0, then timestep-1, etc.) so it makes “sense” to have that dimension first. But it doesn’t really make a difference.

I would suggest just going with the default option only because it’s default.

111335 · August 28, 2020, 1:22pm

Thank you for your helpful answer

Prabu_Rocking · July 9, 2022, 4:43pm

Hi, I was going through the pytorch’s machine translation example program Language Translation with nn.Transformer and torchtext — PyTorch Tutorials 1.12.0+cu102 documentation seems even here the dataset is T(max_len)xBxDmodel. I am failing to understand how does it works?. here in this example program my understanding is we are not feeding the data sequentially we are basically feeding a batch of sentances as src and tgt. so doesn’t it make more sense if our dataset is in the format of BxTxDmodel instead of TxBxDmodel ? or does the later format has some advantage ? or does something wrong with my understanding ?

need the community’s help. I am just getting started in Transformers and struck in this part for a week.