Hello every one. I’m beginner at nlp. i’m learning about how use nn.RNN with pytorch docs.
before start, i’m not good at english grammar XD )
in docs, nn.RNN’s parameter ‘batch_first’ is False in default.
it seems that people use batch_first == False ususally. ( i saw a lot of input datasets are same like that not only at simple nlp model but also at complicated nlp model(e.g bert) )
Can you let me know why people uses dataset’s shape as T x B x * (T is max length) ??
is there any advantages when i set dataset’s shape like that?
Thank you!!