Maybe I am missing something obvious, but if a pre-trained network’s input image size is 300x300 or 500x500, how would the performance of the classification/segmentation inference be affected when I feed a oversampled image from 224px.
While I did observe differences in outputs, are there experiments which can be done to understand how these perturbations in image sizes can affect the model inferene.
Upsampling makes results typically worse than downsampling, based on my practical experience. However, since this is just a factor of ~2x, this shouldn’t be a big issue.
To avoid rescaling issues altogether, you could also just choose architectures that are agnostic to the input dimensions, e.g., fully-convolutional networks (avoiding FC layers). Or, just add a spatial pyramid pooling layer before the FC layer(s).