Why 2D batch normalisation is used in features and 1D in classifiers?

level1807 · March 30, 2022, 4:04am

If there is no difference between them, then why would there be two different functions? And why wouldn’t there be a universal BatchNorm class that accepts inputs with arbitrary dimensions?