Unable to reproduce accuracies on ImageNet validation set

Hi everyone,

I’m currently trying to reproduce previously published top-1 accuracies on the ImageNet validation set based on pre-trained models whose weights have been made publicly available. I noticed that my results are pretty close to the published ones, but they are still somehow off by a small margin.
To find a potential bug, and as a first sanity check, I first tried to reproduce the reported top-1 accuracies for ResNet50 and VGG16 w/o batchnorm by simply evaluating the two pytroch models with pretrained=True on the ImageNet validation set. I get top-1 accuracies of 76.02 (vs. 76.130 reported) for ResNet50 and 71.622 (vs. 71.592 reported) for VGG16.

I really wonder, why it is that I cannot exactly reproduce the results, as the process of simply evaluating the pre-trained nets should be deterministic.

I double checked the following things which I thought could lead to the slight difference:

  • made sure that data pre-processing is correct (i.e. transformations and normalization)

  • I set a random seed using torch.manual_seed()

  • I’m using torch.eval() and troch.no_grad() in the test/eval loop

Any idea what I might be missing?
Thanks in advance.