Imagenet pretrained models image dimensions

over9k · August 31, 2020, 6:02am

Hi,

I see here (https://pytorch.org/docs/stable/torchvision/models.html#classification) that the imagenet pretrained models expect image height and width dimensions to be “at least 224”. What does that mean? Can we pass in images smaller or larger?

Were the models trained on Imagenet images that were resized to 224x224 first? If that were the case, would we not force the images we pass in to also be 224x224? So

This poses another question - the Imagenet images are of different resolutions and it appears that 256x256 is a common size to use. Why then were the Pytorch models trained on 224x224?

Thank you for helping me understand this better.

ptrblck · September 1, 2020, 4:51am

The pretrained models are most likely sticking to the literature for the corresponding model, which often used input images of the shape 224 x 224 (often randomly cropped to this shape).

Since these torchvision models use adaptive pooling layers the strict size restriction was relaxed and you would be able to pass bigger images and (some) smaller images. Note that the min. size depends on the conv and pooling operations, which would create an empty output, if your input is too small.

That being said, I would not assume to see a good performance for largely differently shapes input images and would try to finetune the model for this use case.

over9k · September 1, 2020, 7:51am

Great, thank you for your answers. Will bug you again if more questions arise.

over9k · February 6, 2021, 4:50am

@ptrblck Coming back to this.

I am able to pass much larger images and as you said, the adaptive pool layer before the fully connected layer allows for this.

What do you mean by fine-tuning the model? In terms of fine-tuning in the transfer learning domain or something more structural. For example, I was thinking adding a convolutional layer at the beginning of the pretrained network, followed by an AdaptiveAvgPool2d layer to bring the output size to 224x224 and then pass it into the next conv layer, which is in fact the first of the pretrained layer. Does this make sense?

Thanks!

ptrblck · February 6, 2021, 8:10am

It could make sense and I would definitely try it out and compare it to passing the larger images directly to the pretrained model.