PyTorch dimension conventions?


Sorry if this is a duplicate question yet I couldn’t find a similar question.

Is there a reason why PyTorch uses [N, X] dimension format? That is the prebuilt layers take inputs with dimensionality of [BatchSize, NumberOfInputFeatures] rather than [NumberOfInputFeatures, BatchSize]. The Broadcasting semantics are also arranged accordingly.

Was this an arbitrary design choice or was it intentional because of some efficiency purpose? I am asking this as this is against the dimension conventions generally embraced in papers etc…



I have enrolled different courses and worked with some of the popular frameworks and it seems it is all about conventions. And the most used one is the method pytorch uses.

By the way, Andrew Ng uses same convetion has been used in PyTorch in computer vision and deep learning courses he instruct in Stanford university.

Good luck

1 Like