Input image size to pretrained resnet34

i’m using pretrained resnet34 as encoder in my unet model, in pretrained resnet34 documentation, the input image size is 224x224x3, however i’ve tried input image 224x224x3 and 512x512x3 and model performance on both inputs is not much different. Is it ok to use 512x512x3 input considering this input image quality is better than 224x224x3?

If you are using a pretrained model (without finetuning) then not necessarily getting better accuracy with higher resolution is expected as higher resolution images can present a different distribution of object scales than what the model was trained on. This paper is a good read about the details of what can cause this behavior: [1906.06423] Fixing the train-test resolution discrepancy

The pretrained models contain resize operation within the model. When using vision’s models, one should always be careful, as most of the logic is embedded into the model and not visible to the user. Personally I am not a fan of this approach.

>>> import torch
>>> import torchvision
>>> model = torchvision.models.resnet34()
>>> A = torch.rand(4,3,256,256)
>>> B = torch.rand(4,3,512,512)
>>> model(A)
tensor([[-0.0159,  0.6987,  0.0230,  ...,  0.5298,  0.7964, -0.7284],
        [ 0.0107,  0.6738,  0.1062,  ...,  0.5328,  0.8554, -0.8342],
        [ 0.0376,  0.7694,  0.0367,  ...,  0.4300,  0.7086, -0.7498],
        [-0.1892,  0.6690,  0.0665,  ...,  0.4711,  0.8034, -0.7326]],
       grad_fn=<AddmmBackward0>)
>>> model(B)
tensor([[-0.0178,  0.6623,  0.1057,  ...,  0.4388,  0.8645, -0.8103],
        [ 0.0054,  0.7318,  0.0728,  ...,  0.4826,  0.8127, -0.7748],
        [-0.0430,  0.7653,  0.0638,  ...,  0.5089,  0.8085, -0.7793],
        [-0.0374,  0.6968,  0.1174,  ...,  0.5085,  0.7960, -0.8011]],
       grad_fn=<AddmmBackward0>)