Inconsistent dimension ordering for 1D networks - NCL vs NLC vs LNC

stared · March 12, 2018, 2:08pm

Dimension ordering seems to be inconsistent for 1D networks (for Natural Language Processing and some other signal processing). In case of combining embeddings, convolutions and recurrent networks it requires multiple dimension permutation operations through the code. It makes it less readable, more error-prone, and precludes from using Sequential to combine layers.

Examples

That is, for:

N - batch size / sample size
L - sequence length
C - the number of features / channels / filters

we get:

(N, C, L)

Conv1d, MaxPool1d, BatchNorm1d, etc

(N, L, C)

LSTM, GRU with batch_first=True
Embedding (output)
Linear (assuming we typically mix channels; vide 1x1 convolution)

(L, N, C)

LSTM, GRU with default options,

(N, *)

DataLoader

Questions

Why different order?
At least, is there some canonical PyTorch dimension ordering?
Which permute operations affect performance?

In case of batch_first=False, regardless if we use output or hidden units, we need to run x= x.transpose(0, 1).contiguous() to pass it to linear operators. Does the speed-up for using these options outweigh the slowdown for reordering dimensions?

richard · March 12, 2018, 2:29pm

The canonical Pytorch dimension ordering is (N, C, **) where ** is shape dimensions. For a sequence that gives (N, C, L); for an image (N, C, H, W), etc etc.

LSTM, GRU are a little special because it’s much more efficiently for those to run when their batch dimension isn’t first.

stared · January 15, 2019, 12:32pm

Vide: Tensor Considered Harmful - a proposal for named tensor dimension.