Is transforms.RandomResizedCrop used for Data Augmentation?

In the past, I thought transforms.RandomResizedCrop is used for data augmentation because it will random scale the image and crop it, and then resize it to the demanded size. And the data augmentation part in my code is usually as follows:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

train_transform = transforms.Compose([transforms.RandomHorizontalFlip(), 
                                      transforms.RandomResizedCrop(224), 
                                      transforms.ToTensor(),
                                      normalize])

But I have just check the mannual and it says that

A crop of random size (default: of 0.08 to 1.0) of the original size

According to the comment, it may only get a very very small part of the original image after cropping considering that the scale range from 0.08 to 1.0. In that way, it will be harmful for training.

Do I misunderstand this function? Is it used for data augmentation?
And is it necessary to specifically assign the scale parameter as follows if I want to use it for data augmentation?

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

train_transform = transforms.Compose([transforms.RandomHorizontalFlip(), 
                                      transforms.RandomResizedCrop(224, scale=(0.7, 1.0), 
                                      transforms.ToTensor(),
                                      normalize])

Any suggestion will be helpful.
Thank you!

5 Likes

You can use it for data augmentation.
It’s used for example in the ImageNet training.

The scale might seem a little odd, but have a look at the Inception paper Going deeper with convolutions, section 6:

Still, one prescription that was verified to work very well after the competition includes sampling of various sized patches of the image whose size is distributed evenly between 8% and 100% of the image area and whose aspect ratio is chosen randomly between 3/4 and 4/3.

So they used indeed this scaling to train their InceptionNet.

16 Likes

Hi @ptrblck !
I have been reading this InceptionV1 paper, I understand how they need RandomResizedCrop function, but I didn’t know about the original size of the image, from which they sample the crop and resize to 224x224. Is it 256xN or Nx256 ( 256 is the smallest scaled side )?

If you re using transforms.Resize and pass an int value to the size argument, the smaller edge will be matched to this number.
From the docs:

size ( sequence or int ) – Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size)

If I understand your question correctly, this approach would use both possible resizing strategies depending which edge is smaller.

Let me know, if I misunderstood the question.

Thanks a lot, I got it now.

@ptrblck Regarding various patches, what is the center of the random patches? Is the center also random?

And aspect ratio range would be used while computing the patches, right? not for the final image size. Because the final image’s aspect ratio is inferred from the size parameter mentioned?

Yes, the top left pixel location as well as the height and width will be randomly sampled as seen here.

The aspect_ratio is used to create the w and h values and there is a fallback to a central crop here.

1 Like

Thanks for confirming :slight_smile: