Sorry if this is a duplicate question yet I couldn’t find a similar question.
Is there a reason why PyTorch uses [N, X] dimension format? That is the prebuilt layers take inputs with dimensionality of [BatchSize, NumberOfInputFeatures] rather than [NumberOfInputFeatures, BatchSize]. The Broadcasting semantics are also arranged accordingly.
Was this an arbitrary design choice or was it intentional because of some efficiency purpose? I am asking this as this is against the dimension conventions generally embraced in papers etc…