How do I replicate the exact preprocessing for Imagenet 2012 validation?

leeks · February 22, 2018, 2:51am

Using the default transformation which is provided here:

transform = transforms.Compose([
    transforms.RandomSizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
                         std = [ 0.229, 0.224, 0.225 ]),
])

Source: https://github.com/pytorch/examples/blob/27e2a46c1d1505324032b1d94fc6ce24d5b67e97/imagenet/main.py#L48-L62

I am unable to reproduce the validation accuracy for inception V3. In fact, the accuracy is lower by 6%. The problem lies mainly in the cropping, as the 78.0% accuracy reported by Google uses central crop with a specific proportion of 0.875, before resizing the image with bilinear interpolation. I am able to reproduce Google’s result with TensorFlow but not with Pytorch. I have checked that the central cropping transformation in pytorch modules require a specific size to central crop, but not a fraction. is there a way I could deal with fractions instead?

Specific transformation from TF: https://github.com/tensorflow/models/blob/master/research/slim/preprocessing/inception_preprocessing.py#L243-L281

Note that I think the transformation is at fault here, not the model itself, since after changing to central crop I can get 76% accuracy, which is close but not close enough for reproducibility reasons. I’m very new to PyTorch (first day using it) so would be great if someone could share if there are any existing tools to do this already.

That said, pytorch really makes life easy for us researchers and it’s a wonderful tool. Thanks a lot!

SimonW · February 22, 2018, 5:30am

Unfortunately it seems that only size is supported at the time. However, since size is computable from the proportion, you can do that

ignacio-rocco · February 22, 2018, 9:38am

Check the updated Imagenet main.py here

github.com

pytorch/examples/blob/master/imagenet/main.py

import argparse
import os
import shutil
import time

import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.optim
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models

model_names = sorted(name for name in models.__dict__
    if name.islower() and not name.startswith("__")
    and callable(models.__dict__[name]))

This file has been truncated. show original

    val_loader = torch.utils.data.DataLoader(
        datasets.ImageFolder(valdir, transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            normalize,
        ])),

Here it’s doing the central crop instead of a random crop as in the code you posted.

However, If you want to control better the cropping, you can do your custom transformation as in here:

http://pytorch.org/tutorials/beginner/data_loading_tutorial.html#transforms

leeks · February 22, 2018, 2:42pm

Thanks for the tip. I have made some changes and submitted a pull request here: https://github.com/pytorch/vision/pull/429