I want to use BatchNorm1d layer as standardization, i.e. simply remove the mean and divide by standard deviation from embeddings of shape (N, C) where N is the batch size and C is the embedding dimension. I think that setting affine and track_running_stats both to False should do the trick, i.e.:
standardization = nn.BatchNorm(num_features=C, affine=False, track_running_stats=False)
Is this correct or I should also set momentum to 1?