Why 2D batch normalisation is used in features and 1D in classifiers?

If there is no difference between them, then why would there be two different functions? And why wouldn’t there be a universal BatchNorm class that accepts inputs with arbitrary dimensions?

1 Like