Testing accuracy gap when training a resnet50 on ImageNet from scratch

Hi all,

I’m currently interested in reproducing some baseline image classification results using PyTorch.
My goal is to get a resnet50 model to have a test accuracy as close as the one reported in torchvision: torchvision.models — Torchvision 0.8.1 documentation (i.e. 76.15 top 1 accuracy)
In order to do that, I closely follow the setup from the official PyTorch examples repository: examples/main.py at master · pytorch/examples · GitHub.
Namely, I set:

  • seed=19
  • batch_size=256
  • lr=0.1
  • weight_decay=1e-4
  • SGD is using momentum=0.9
  • LR scheduler is the StepLR that decays the learning rate by 10 every 30 epochs
  • I train for 100 epochs (as opposed to 90 in the code above)
  • I use exactly the same data augmentation as the code above

The only difference is that I’m leveraging PyTorch Lightning to seamlessly use 4 GPUs in Distributed Data Parallel mode on a single node.
However, I am only able to achieve 73.12 top 1 accuracy. I don’t want to draw conclusions on my other experiments given this gap on the standard baseline.

My question: has anyone tried and reproduced the torchvision numbers using the setup I described above?
From my reading in the resnet models source code, the pretrained weights could have been obtained by following this setup: NVIDIA NGC where all the hyperparameters have been thoroughly tuned. Can someone confirm this? In that case, what is the top 1 accuracy I should expect on the val set when using a simpler setup (the one described above)?

Have a good day!

I put together a “Minimal Working Example” to reproduce my results here: GitHub - inzouzouwetrust/resnet50-imagenet-baseline: Image classification baseline using ResNet50 on ImageNet.
Let me know if that helps!


I may have found the root cause for the test performance discrepancy.

In my implementation, I happened to use a total batch size equal to 1024 as each process used a batch size of 256 and 4 processes were spawned. In the official PyTorch example, each process use bs=256/N where N is the number of processes (4 here). It means that I had to either adjust the batch size (i.e. set it to 64 per process) or tune the learning rate accordingly (i.e. set it higher initially, e.g. 0.4 when using 256 images per process).

I will keep this post updated once I get the final results.


I am trying to figure out the accuracy gaps between the pytorch model accuracies compared to resnet50 paper .The published accuracies are slightly better that pytorch ones. I was wondering if that was due to changes in ImageNet datasets used for training. Did you use ImageNet 2012 training and validation datasets for your runs?

Most of the hyperparameters you posted are similar to that of resnet paper. I am curious to know if your final results are closer to either pytorch implementation or the resnet paper and if you were able to close the accuracy gap? Thanks!

Hey @ashish007git,

Sorry for the late reply, I was pretty busy :slight_smile:
If you take a look directly in torchvision.models source code (here), you can see some comments that explain that some implementation details lead to a better accuracy.
Beside, if you take a look at NVIDIA NGC you will see that their training routine slightly differs from the one published in the original paper. This way you can squeeze out those last .1%
My understanding is that the pretrained weights from torchvision.models are obtained by performing the training procedure linked above.

Have a nice day!