I am trying to follow the data augmentation practice in the original ResNet paper Deep Residual Learning for Image Recognition, which includes:
The image is resized with its shorter side randomly sampled in [256, 480] for scale augmentation [41]. A 224×224 crop is randomly sampled from an image or its horizontal flip, with the per-pixel mean subtracted [21]. The standard color augmentation in [21] is used.
Here, the random resize is explicitly defined to fall in the range of [256, 480], whereas in the Pytorch implementation of RandomResizedCrop
, we can only control the resize ratio, i.e., a range of scaling the images no matter what the resulting size is. While it seems reasonable to do so to keep the resolution consistent, I wonder:
-
are there ways to make
RandomResizedCrop
behave in the “explicit resizing” way? (if not I’ll consider implement my own) -
what’s the essential reason to ditch the “explicit resizing” from the original paper? are papers nowadays all preferring the “ratio resizing” version?