Accuracy on the pre-trained resnets


I am trying to use some of the pretrained CNNs for a project of mine, but it seems to me that the accuracy is too low. I tried to look for the top 1 accuracy in the validation set of Imagenet (ILSVRC-2012), and I evaluated it on the smallest and largest resnet, comparing the results with the facebook results.

I am doing just scaling (if I do random crop or center crop it works worse), and here are the accuracy results I am getting:

resnet18:   mine 0.672    facebook 0.6957
resnet152: mine 0.7164  facebook 0.7784

Now, I am not doing anything ridiculous like data augmentation in the inference stage, and I am getting the net on eval mode before I start the evaluation. I am also using the correct normalization parameters:

r_mean, g_mean, b_mean = (0.485,  0.456, 0.406)
r_std, g_std, b_std = (0.229, 0.224, 0.225)

For what is worth, here is the code I am using for dataset preparation:

transform = torchvision.transforms.Compose([
    torchvision.transforms.Scale((224, 224)),
    torchvision.transforms.Normalize(mean=(r_mean, g_mean, b_mean),
                                     std=(r_std, g_std, b_std))

and here it is the code for the testing:

correct = 0
total = 0
for data in test_loader:
    inputs, labels = data
    inputs, labels = inputs.cuda(), labels.cuda()
    outputs = net(torch.autograd.Variable(inputs))
    _, predicted = torch.max(, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

accuracy = correct / float(total)

Am I doing some mistake (missing something), or just that the PyTorch resnets are not trained as well as the facebook resnets?


What you report is also quite different from what has been written in PyTorch documentation:

I don’t know which dataset split those scores are for though. I would also like to test this when I have time. Maybe @fmassa and @smth can comment on this.

1 Like

Reading the code on the link here:

it seems that the problem was that before scaling or center cropping to 224, I needed to do a scaling on 256 and then a center crop. Now, the resnets are working as good as the reported accuracy.

The code for that:

val_loader =
    datasets.ImageFolder(valdir, transforms.Compose([
    batch_size=args.batch_size, shuffle=False,
    num_workers=args.workers, pin_memory=True)

I used the same ratio between Scale and CenterCrop for Inception net (there the input size is supposed to be 299 pixels, rather than 224) and it seems to work quite well too.

1 Like