I am working on an image classification problem on a Custom Dataset consisting of 1888 training images and 800 validation images(8 classes). I have tried applying transfer learning using various models from the torchvision.models library. For each model, I am using pre-trained weights and am only training the final Linear layer which performs classification. I have the following results so far on my validation set(using batch size 32 and using SGD with momentum as the optimizer with lr 0.001)
1. Alexnet - 93
2. VGG16 - 93
3. VGG16_bn - 57
4. Resnet50 - 26
5. VGG19 - 91
6. VGG19_bn - 59
I have repeated the experiments with both model.train() and model.eval() but the results do not seem to change much. So, from the results, I am guessing that the models having BatchNorm layers are performing very poorly compared to other models that don’t have them. Any ideas why this might be happening? Any help would be appreciated. Thanks!
I have looked at this post on the forums about a possible solution which involves increasing the momentum value in the BatchNorm constructor.
How exactly am I supposed to make this change? Do I have to manually change the batchnorm.py code or is there any better way to make this happen?




