I am confused with the input shape convention that is used in Pytorch in some cases:

The nn.Layer’s input is of shape (N,∗,H_in) where N is the batch size, H_in is the number of features and ∗ means “any number of additional dimensions”. What exactly are these additional dimensions and how the nn.Linear is applied on them?

The nn.Conv1d’s input is of shape (N, C_in, L) where N is the batch size as before, C_in the number of input channels, L is the length of signal sequence.

The nn.Conv2d’s input is of shape (N, C_in, H, W) where N is the batch size as before, C_in the number of input channels, H is the height and W the width of the image.

The nn.BatchNorm1d’s input is of shape (N, C) or (N, C, L) where N is the batch size as before. However what does the C and L denote here? It seems that C = number of features, L = number of channels, based on the description in documentation “2D or 3D input (a minibatch of 1D inputs with optional additional channel dimension)”. This is inconsistent with the nn.Conv1d notation.

The nn.BatchNorm2d’s input is of shape (N, C, H, W) where N is the batch size as before, H and W are the height and width of the image respectively. What does the C denote here? Is it the number of features as in nn.BatchNorm1d or the number of channels as in nn.Conv2d? It seems to be the number of channels since we are talking about “a 4D input (a minibatch of 2D inputs with additional channel dimension)”, but then in documentation we have the line “num_features – C from an expected input of size (N, C, H, W)”, so C is both number of channels and number of features which is weird. So perhaps num_features should be renamed to num_channels.