Bad prediction of batches when model.eval()

Please reproduce the behaviour and explain why this happens.

resnet50 = models.resnet50(pretrained=True) # load pretrained model
resnet50.eval() # set model to eval mode

# batch size > 1
for index, batch in enumerate(imagenet_val_loader): # iterate over dataloader
    output = resnet50(batch) # predict for a batch of images
    _, pred = output.topk(1, 1, True, True) # get predicted classes
    print(pred) # show predicted classes

# single image or batch size = 1
for index, image in enumerate(imagenet_val_dataset): # iterate over dataset
    output = resnet50(image) # predict for a single image
    _, pred = output.topk(1, 1, True, True) # get predicted class
    print(pred) # show predicted class

Make sure that the loader is not on shuffle to reproduce the behaviour.

  1. Do you get the same predictions from the loader and the dataset?
  2. Try again without resnet50.eval() (comment it), is the prediction the same as before? If not, is it swapped?

What I observe:
With eval, the model predicts correctly for the image in the dataset loop (single) and predicts incorrectly for the images in the loader loop (batches). Without eval, the performance is reversed.

What I expect:
Without eval, it works for batches and not for individual images. (Not ideal but I understand why - no problem here).
With eval, the model should predict correctly for single images (does do) and batches of images (does not do).

I cannot reproduce the issue. I’ve used two DataLoaders with a batch size of 10 and a batch size of 1, respectively. Both loops yield the same results using model.eval().
I haven’t used model.train(), since e.g. the BatchNorm layers will update their running estimates, which might yield different results.

Using your second approach (loop over Dataset), I would expect to get a size mismatch error, since the batch dimension should be missing. Could you check that?

1 Like


I tried to isolate the problem and tried to replicate the issue in a new project, the predictions were consistent as you say.

When I load the pre-trained model, the mode is training by default, which had inconsistent results due to running estimates. I assumed the accuracy of this model is the expected value. When I observed a lower accuracy for the model in eval mode. At the time, I was switching between batch predictions and individual predictions which led to the confusion.

Thank you for checking.

What I still find interesting is that the accuracy drops from 0.703125 (train mode) to 0.515625 (eval mode) for the first batch of validation images in ImageNet. (I hope these numbers are correct)

Bottom line - the eval mode works consistently as expected.