Behavior of BatchNorm2d layer when batch size is 1 and mode is train

Hi! I am facing a strange situation. I am using Resnet 101 which contains batchNorm layers for segmentation. The batch size while testing is set to 1. The output is better if I use train() mode as compared to eval() mode. According to the batchnorm formula, when batch size is 1, mean[x]=x and Var[x]=0. Thus the final output of the formula will be beta alone (gamma multiplied by 0 is 0). This should give very bad results as the input information is not propagated by the batchNorm layer. Surely, I am making some mistake. Any help is appreciated.
Omkar Damle.

BatchNorm uses it’s running statistics (running_mean and running_var) without updating it in eval mode.
Usually, it shouldn’t be a problem using a batch size of 1, if you set your model to eval.

It’s a bit strange however, that you got better results keeping the model in train, since the BatchNorm layers will be updated with the “batch statistics” of your single input.

How large is the discrepancy between the training and test accuracy?

Won’t the batch statistics be mean = x (the only element) and variance = 0? The training precision and recall are both around 94%. I cannot calculate the test precision and recall as the ground truth is not publicily available.