Why parameter 'batch_first' is needed?

Dan_L1 · September 23, 2018, 7:25am

Hi! I am a newbie to pytorch and deep learning in general and I have found the documentation on output shape of RNN layers to be a little bit confusing.

I’ve noticed that the default output shape of RNN layers is shape (seq_len, batch, num_directions * hidden_size) but by setting ‘batch_first’ the output shape would become (batch, seq, feature). Why not having batch first as the default output shape? Is it because of some optimization problem?

Thanks!

tom · September 23, 2018, 11:13am

If you organise things sequence first, then each timestep, which is much like a regular layer (linear on hidden + linear on input + nonlinearities + gating) operates on contiguous bits of data, and you have good caching properties etc.

Best regards

Thomas

Dan_L1 · October 1, 2018, 8:47am

Thank you for the reply Thomas! So does this mean set ‘batch_first=true’ will compromise the computing performance/speed?

tom · October 1, 2018, 9:41am

That is the idea, I haven’t benchmarked it myself, though, so I’m not in the best position to discuss this in detail.