Got a question about Implement of nn.Transformer

The shape of input tensor src(S, N, E) and output tensor tgt(T, N, E) are both let their length(i.e. S and T) infront of batch(i.e. N). I am very curious. Why the programmer arrange their shape in this way? By habit, we always put ’ batch ’ at the first dim.

I guess for performance reasons as was also done in RNNs.