Performance drops dramatically when switch to model.eval

Dear all,
I am using pytorch=1.3.1 to train a deeplab-liked resnet50 model with two dataloaders, one is a source-domain dataloader, the other a target-domain dataloader, batch_size=8.

With the help of V100-32G GPU, the performance, eg: Dice score, of the model on both domains gets better and better when I set model.train. But if I switch model.train to model.eval to do evaluation on source/target domain training/validation set, strange thing emerged that the performance of the model drops dramatically to 0! And by the way, if I use the trained model to test on small batch, eg, 4/8 images, model.train mode will get a normal performance but model.eval get 0 performance.

After debugging, the problem was caused by BatchNorm2D layers. If I set model.train to do evaluation on any training or validation set, everything will be ok. But if I set model.eval, once again there the problem is.

I also find that other pytorch users encountered the problem, and their solutions such as set track_running_stats=False when creating BN layers, not to reuse BN layers or set a large batch_size can not help solve me out.
Performance highly degraded when eval() is activated in the test phase

So if others also trapped into this problem, I will be very happy that you can provide any suggetion.

Best, dong.

Frankly, I am doing an unsupervised domain adaptation project, so I figure out what is influencing the performance when using BatchNorm layers just now.

Here is the solution link which is helpful folayersr those UDA project with BatchNorm.
Possible Issue with batch norm train/eval modes

For a short word, this answer helps me a lot.

But it is very strange to do this because when we deploy a model pretrained on the training set, we don’t hope to change the parameters of the pretrained model, so I am exploring other solutions.