Hi,
I have a trained Vision Image Transformer from time (Google’s ViT paper) and I want to change the input size, and use an input with larger dimensions.
Code: https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py
My model that I want to change its input size:
model = timm.models.vit_base_patch16_224_in21k(pretrained=True)
I tried accessing the dictionary with the input size
timm.models.vision_transformer.default_cfgs[‘vit_base_patch16_224_in21k’][‘input_size’] = (3,400,400)
But it doesn’t seem to solve it. I still get the following error (Iknow I can use a 384*384 pretrained model, but that’s not the question here):
f"Input image size ({H}{W}) doesn’t match model ({self.img_size[0]}{self.img_size[1]})."
AssertionError: Input image size (400400) doesn’t match model (224224).
I think that I should add a layer before the model to solve that. Any ideas?