Larger image to Imagenet pretrained resolution


I am using the Imagenet Pretrained Resnet 18 model and according to torchvision.models — PyTorch 1.7.0 documentation the images that are fed into the model have to be 224x224. The problem is that my input image is much larger, for example, 2500x2500 or any other arbitrary resolution. I am looking for a way to feed in my images and possibly have a first convolution layer that would “map” from my image space to that of the Imagenet pretrained one. Almost like a downsampling of sorts, but a convolution layer. What this would result in is all the layers having the pretrained weights, except the first one, which would come from my input image.

Does this sound like something tenable? What would be a way to proceed here?

Thank you for the help.

@ptrblck you had some knowledge about Imagenet pretrained models image dimensions

Any techniques that could be used for this use case? Thanks.

It’s impossible to apply ResNet to an image of any resolution because the penultimate layer is an AdaptiveAvgPool2d. However, the model may work well with an overwhelmingly larger input compared to the training image.

Thanks for replying.

Wouldn’t the penultimate layer work on the downstream convolutions, which in this case remain the same? Creating a new layer (or some method) at the beginning of the model affects this?

Of course it still works. I suggtest that you just resize (by cv2, scikit-image or whatever library) and crop (or pad( your input image to the size of 224x224, then feed it into ResNet.

I have tried this already, but the issue is that my input image is quite high resolution and has a lot of information. Resizing it smaller results in much lower performance compared to previous baseline measures. Which is why I’m looking into somehow feeding in the original (bigger) image.

How is the performance when you just feed the original image to ResNet?

This is the main issue I’m currently facing.

@ptrblck had said in Imagenet pretrained models image dimensions that I should stick with 224x224 for the best performance, which I’d like to do. This is why I’m asking about possible ways of adding an additional layer at the beginning to then fully use the the later layers, if that makes sense.

he was right, 224x224 is the best resolution for performance.
If you have sufficient computational resources, you can try to resize your images to 1120x1120 or even larger then retrain your model.

In case you don’t have to stick with original ResNet, you can try models using dilated convolution.