Hi everyone,
I’m currently trying to reproduce previously published top-1 accuracies on the ImageNet validation set based on pre-trained models whose weights have been made publicly available. I noticed that my results are pretty close to the published ones, but they are still somehow off by a small margin.
To find a potential bug, and as a first sanity check, I first tried to reproduce the reported top-1 accuracies for ResNet50 and VGG16 w/o batchnorm by simply evaluating the two pytroch models with pretrained=True
on the ImageNet validation set. I get top-1 accuracies of 76.02 (vs. 76.130 reported) for ResNet50 and 71.622 (vs. 71.592 reported) for VGG16.
I really wonder, why it is that I cannot exactly reproduce the results, as the process of simply evaluating the pre-trained nets should be deterministic.
I double checked the following things which I thought could lead to the slight difference:
-
made sure that data pre-processing is correct (i.e. transformations and normalization)
-
I set a random seed using
torch.manual_seed()
-
I’m using
torch.eval()
andtroch.no_grad()
in the test/eval loop
Any idea what I might be missing?
Thanks in advance.