Why parameter 'batch_first' is needed?

Hi! I am a newbie to pytorch and deep learning in general and I have found the documentation on output shape of RNN layers to be a little bit confusing.

I’ve noticed that the default output shape of RNN layers is shape (seq_len, batch, num_directions * hidden_size) but by setting ‘batch_first’ the output shape would become (batch, seq, feature). Why not having batch first as the default output shape? Is it because of some optimization problem?


If you organise things sequence first, then each timestep, which is much like a regular layer (linear on hidden + linear on input + nonlinearities + gating) operates on contiguous bits of data, and you have good caching properties etc.

Best regards


1 Like

Thank you for the reply Thomas! :slight_smile: So does this mean set ‘batch_first=true’ will compromise the computing performance/speed?

That is the idea, I haven’t benchmarked it myself, though, so I’m not in the best position to discuss this in detail.

1 Like