How to pass image of any size to Pytorch ViT pretrained model?

import torch
from torchvision import models

model = models.vit_b_32(pretrained=True ,image_size=320)
model.eval()

The above piece of code is failing at Line 3 with the below error:

ValueError: The parameter 'image_size' expected value 224 but got 320 instead. 

So does Pytorch’s pre-trained Vision Transformer model take only a fixed shape input image size unlike pre-trained ResNet’s which are flexible with the image size ?

I am shying a bit from downsizing my image as I am trying to perform Crack-detection on some metal surfaces. After downsizing to 224, the crack pixels become far too small which I believe may affect my model’s performance. When I train my model on ResNet’s, I get the optimal performance for image shapes > 400px

If pretrained, yes. 224 is the defacto size. If you do not need pretrained, you’ll need to specify the image_size and patch_size arguments.

Now, you might be able to get away with pretrained if you change out some layers. And then redefine the image_size, post mortem. Note, you’ll want to keep the patch_size unchanged.

Here is the model code: