I am learning a CNN regressor. I am experimenting with a very basic network (seen below).
self.features = nn.Sequential(
nn.Conv2d(3, 8, kernel_size=5),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.ReLU(inplace=True),
nn.Conv2d(16, 32, kernel_size=5),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.ReLU(inplace=True),
)
With this network, the model does not learn anything - both training and test loss are very high and decreasing very little with training.
When I add BatchNorm2d() layer after the conv layers, the model is overfitting heavily.
self.features = nn.Sequential(
nn.Conv2d(3, 8, kernel_size=5),
nn.BatchNorm2d(8),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.ReLU(inplace=True),
nn.Conv2d(8, 16, kernel_size=5),
nn.BatchNorm2d(16),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.ReLU(inplace=True),
)
Is there a reason why BatchNorm2d is causing the model to overfit heavily? Am I applying BatchNorm correctly? Irrespective of whether I double the data or decrease the size of the network further, batchNorm2d() causes heavy overfitting. So i am curious how to apply regularization with batchnorm.