BatchNorm2d essential?

alwynmathew · June 19, 2018, 6:19am

This is a snippet from my network code:

conv_block += [nn.Conv2d(in_dim, out_dim, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(out_dim), nn.ELU()]

with BatchNorm2d. The network gave good results for first few training epochs but after that it get unstable.

During debugging, I removed BatchNorm2d from the network to analysis the effect:

conv_block += [nn.Conv2d(in_dim, out_dim, kernel_size=3, stride=1, padding=1), nn.ELU()]

but result were very bad and was not even comparable with the version with BN.

Why is it so?

ptrblck · June 19, 2018, 8:35am

BatchNorm layers have the ability to normalize the input activations, so that learning will be accelerated/feasible.
The original paper clains it’s due to the reduction of the internal covariance shift.
Recent papers claim that’s bogus, and give another perspective on the functionality.
Have a look at the original paper for more information.