# Batch Normalization over which dimension?

Hello everyone,

over which dimension do we calculate the mean and std? Is it over the hidden dimensions of the NN Layer, or over all the samples in the batch for every hidden dimension separately?
In the paper it says we normalize over the batch.
In torch.nn.BatchNorm1d hower the input argument is “num_features”. Why would we calculate the mean and std over the different features instead of the different samples?

You are correct that `num_features` corresponds to the “hidden dimension” rather than the batch size. However, if you think about this from the perspective of what statistics batchnorm needs to track, this makes sense. For example, for a hidden dimension of size 512, batchnorm needs to keep track of mean and variance for each of the 512 dimensions. Here, `num_features` is really just telling the module how much storage it needs to track its stats. Note that this size doesn’t depend on the batch size as taking the mean reduces across the batch dimension.