Using BatchNorm gives a significantly low accuracy

I am working on an image classification problem on a Custom Dataset consisting of 1888 training images and 800 validation images(8 classes). I have tried applying transfer learning using various models from the torchvision.models library. For each model, I am using pre-trained weights and am only training the final Linear layer which performs classification. I have the following results so far on my validation set(using batch size 32 and using SGD with momentum as the optimizer with lr 0.001)

1. Alexnet - 93 
2. VGG16 - 93  
3. VGG16_bn - 57 
4. Resnet50 - 26
5. VGG19 - 91
6. VGG19_bn - 59 

I have repeated the experiments with both model.train() and model.eval() but the results do not seem to change much. So, from the results, I am guessing that the models having BatchNorm layers are performing very poorly compared to other models that don’t have them. Any ideas why this might be happening? Any help would be appreciated. Thanks!

I have looked at this post on the forums about a possible solution which involves increasing the momentum value in the BatchNorm constructor.
How exactly am I supposed to make this change? Do I have to manually change the code or is there any better way to make this happen?

Do you see an increased accuracy, if you run the evaluation in model.train() so that the batchnorm layers will use the batch statistics?

You can change the momentum by directly assigning the new value to this attribute:

model = ... = new_value

@ptrblck, thanks for the help. I have summarised a list of results which I have obtained. So, basically train() and eval() makes little difference. And changing the momentum value from the default 0.1 to 0.5 also did not seem to make much of a difference. I am sharing the plots of loss and accuracy which I have obtained after training.

Normal VGG16 without batchnorm
vgg16_loss vgg16_acc

VGG16 with batchnorm,set to eval()
vgg16_bn_eval_default_momentum_loss vgg16_bn_eval_default_momentum_acc

VGG16 with batchnorm, set to train()
vgg16_bn_train_default_momentum_acc vgg16_bn_train_default_momentum_loss

VGG16 with batchnorm, set to eval() and momentum changed to 0.5
vgg16_bn_eval_5_momentum_acc vgg16_bn_eval_5_momentum_loss

VGG16 with batchnorm, set to train() and momentum changed to 0.5
vgg16_bn_train_5_momentum_acc vgg16_bn_train_5_momentum_loss

So, from the graphs, it’s pretty clear that BatchNorm in general has no improvement irrespective of the momentum value and the mode. Any ideas what could be causing the problem?

I have also shared a link to my Google Colab Notebook which I used for training here if it further helps in debugging anything