I want to use BatchNorm1d
layer as standardization, i.e. simply remove the mean and divide by standard deviation from embeddings of shape (N, C)
where N
is the batch size and C
is the embedding dimension. I think that setting affine
and track_running_stats
both to False
should do the trick, i.e.:
standardization = nn.BatchNorm(num_features=C, affine=False, track_running_stats=False)
Is this correct or I should also set momentum to 1?