Batch Normalization of Linear Layers

@shirui-japina In general, Batch Norm layer is usually added before ReLU(as mentioned in the Batch Normalization paper). But there is no real standard being followed as to where to add a Batch Norm layer. You can experiment with different settings and you may find different performances for each setting.

As far as I know, generally you will find batch norm as part of the feature extraction branch of a network and not in its classification branch(nn.Linear).

3 Likes