Output varies when changing batch size (during test)

I’m using a resnet18 network from torchvision.models.

During test time, I observe If I alter the batch size for the data loader, the accuracy of the model on test data changes. I can not comprehend why should this happen. Isn’t the case that the weights of the network are fixed during testing(I haven’t called any optimizer.step())? I also went through resnet’s architecture, there is no randomized output at any layer.

Any pointers where I could be wrong in my code or my information?

I think it’s because of the BatchNorm layer. The effect was more visible with smaller batch sizes. Accuracy dropped from 99.7% for a batch size of 25 to 14.85 % for a batch size of 1.

Have to set the model to evaluation mode with model.eval()? The batch size should not change the predictions!

Thanks! After model.eval() is called, the model starts using running mean and variance for normalization. I never went through this part of the documentation.

Hey ! I got the same issue on my code. The results are very bad with a batch size of 1, which is not very practical for evaluation of a single image.

I put model.eval() as well as

    for child in model.children():
            if type(child) == nn.BatchNorm2d:
                child.track_running_stats = False

EDIT: does eval makes the use of the running stats but do not save the variance and mean ? Are the variance and mean supposed to scale with the batch size ?

Any ideas ? The network is a stock ResNet adapted for regression instead of classification

This one-liner basically saved my day. Thanks a lot!