Data Transformation for Training and Validation Data

I am a beginner in PyTorch and I am currently going through the official tutorials.

In the Transfer Learning tutorial, the author used different transformation steps for Training and Validation data.

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

I wanted to understand the intuition behind this decision.

The general approach is to use some data augmentation in your training so artificially create “new” samples of your data, and to use some “constant” pre-processing in your validation/test case.
As you can see the training transformation has some random transforms, e.g. RandomResizedCrop, which is fine for training, but could yield different predictions for the same sample in your test dataset.
It is thus preferred to use the non-random versions of the transformations for the validation/test to get consistent predictions.

4 Likes