Batchnorm2d in a CNN regressor

I am learning a CNN regressor. I am experimenting with a very basic network (seen below).

        self.features = nn.Sequential(
            nn.Conv2d(3, 8, kernel_size=5),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(inplace=True),
            
            nn.Conv2d(16, 32, kernel_size=5),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(inplace=True),
        )

With this network, the model does not learn anything - both training and test loss are very high and decreasing very little with training.

When I add BatchNorm2d() layer after the conv layers, the model is overfitting heavily.

        self.features = nn.Sequential(
            nn.Conv2d(3, 8, kernel_size=5),
            nn.BatchNorm2d(8),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(inplace=True),
            
            nn.Conv2d(8, 16, kernel_size=5),
            nn.BatchNorm2d(16),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(inplace=True),
        )

Is there a reason why BatchNorm2d is causing the model to overfit heavily? Am I applying BatchNorm correctly? Irrespective of whether I double the data or decrease the size of the network further, batchNorm2d() causes heavy overfitting. So i am curious how to apply regularization with batchnorm.

  1. The order seems right:
    CONV/FC → BatchNorm → activation → Dropout → CONV/FC → …

There are two few details here, but if your train data and test data are from different distributions, the BN may seem to cause overfitting due to its moving average that fits the train data and not the test data.