Explicit resize range in RandomResizedCrop

CDhere · February 21, 2021, 8:30pm

I am trying to follow the data augmentation practice in the original ResNet paper Deep Residual Learning for Image Recognition, which includes:

The image is resized with its shorter side randomly sampled in [256, 480] for scale augmentation [41]. A 224×224 crop is randomly sampled from an image or its horizontal flip, with the per-pixel mean subtracted [21]. The standard color augmentation in [21] is used.

Here, the random resize is explicitly defined to fall in the range of [256, 480], whereas in the Pytorch implementation of RandomResizedCrop, we can only control the resize ratio, i.e., a range of scaling the images no matter what the resulting size is. While it seems reasonable to do so to keep the resolution consistent, I wonder:

are there ways to make RandomResizedCrop behave in the “explicit resizing” way? (if not I’ll consider implement my own)
what’s the essential reason to ditch the “explicit resizing” from the original paper? are papers nowadays all preferring the “ratio resizing” version?

Dwight_Foster · February 22, 2021, 1:28pm

random resized crop does have a shape parameter

torchvision.transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=2)

The size parameter is the same thing so you can just put the values in there. So like this

torchvision.transforms.RandomResizedCrop((256,480))

CDhere · February 22, 2021, 9:27pm

I don’t think the first parameter refers to the intermediate size before the crop; instead I think it means the final output size.

What I want is if we can control the intermediate size it scales to before the random crop.

Dwight_Foster · February 22, 2021, 9:42pm

So you want to change the shape of the image that goes into the crop? Couldn’t you just do a random resize before the crop then.