Change the input size of timm's ViT model

Nimrod_Daniel · August 11, 2021, 8:02am

Hi,

I have a trained Vision Image Transformer from time (Google’s ViT paper) and I want to change the input size, and use an input with larger dimensions.
Code: https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py

My model that I want to change its input size:

model = timm.models.vit_base_patch16_224_in21k(pretrained=True)

I tried accessing the dictionary with the input size

timm.models.vision_transformer.default_cfgs[‘vit_base_patch16_224_in21k’][‘input_size’] = (3,400,400)

But it doesn’t seem to solve it. I still get the following error (Iknow I can use a 384*384 pretrained model, but that’s not the question here):

f"Input image size ({H}{W}) doesn’t match model ({self.img_size[0]}{self.img_size[1]})."
AssertionError: Input image size (400400) doesn’t match model (224224).

I think that I should add a layer before the model to solve that. Any ideas?

my3bikaht · December 13, 2021, 11:27am

function

timm.models.vit_base_patch16_224_in21k(pretrained=True)

calls for function

_create_vision_transformer

which, on it’s turn calls for

build_model_with_cfg(

This function creates instance of a class VisionTransformer(nn.Module) (currently line 230) with following (default) parameters:

img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4., qkv_bias=True, representation_size=None, distilled=False, drop_rate=0., attn_drop_rate=0., drop_path_rate=0., embed_layer=PatchEmbed, norm_layer=None, act_layer=None, weight_init=‘’

If you don’t need pretrained model, you can create instance of this class with required parameters directly.