How to pass image of any size to Pytorch ViT pretrained model?

stanleygeorge · March 2, 2023, 9:34am

import torch
from torchvision import models

model = models.vit_b_32(pretrained=True ,image_size=320)
model.eval()

The above piece of code is failing at Line 3 with the below error:

ValueError: The parameter 'image_size' expected value 224 but got 320 instead.

So does Pytorch’s pre-trained Vision Transformer model take only a fixed shape input image size unlike pre-trained ResNet’s which are flexible with the image size ?

I am shying a bit from downsizing my image as I am trying to perform Crack-detection on some metal surfaces. After downsizing to 224, the crack pixels become far too small which I believe may affect my model’s performance. When I train my model on ResNet’s, I get the optimal performance for image shapes > 400px

J_Johnson · March 2, 2023, 9:53am

If pretrained, yes. 224 is the defacto size. If you do not need pretrained, you’ll need to specify the image_size and patch_size arguments.

Now, you might be able to get away with pretrained if you change out some layers. And then redefine the image_size, post mortem. Note, you’ll want to keep the patch_size unchanged.

Here is the model code:

github.com

pytorch/vision/blob/main/torchvision/models/vision_transformer.py

import math
from collections import OrderedDict
from functools import partial
from typing import Any, Callable, Dict, List, NamedTuple, Optional

import torch
import torch.nn as nn

from ..ops.misc import Conv2dNormActivation, MLP
from ..transforms._presets import ImageClassification, InterpolationMode
from ..utils import _log_api_usage_once
from ._api import register_model, Weights, WeightsEnum
from ._meta import _IMAGENET_CATEGORIES
from ._utils import _ovewrite_named_param, handle_legacy_interface


__all__ = [
    "VisionTransformer",
    "ViT_B_16_Weights",
    "ViT_B_32_Weights",

This file has been truncated. show original