Image resoultion

How can I feed images with 1024*1024 pixels into pre-trained model Vgg16?
how to modify that in pytorch code?

It depends on what you mean by feed images that are 1024*1024. If you are using a pretrained model without further finetuning, you likely want to simply resize the images to something around 256x256 and take a 224x224 center crop. If you decide not to resize or crop, the model should “work” in that there should not be any shape mismatches due to the use of average pooling with input images at 1024x1024, but the model accuracy will likely not be great due to the potential scale mismatch. Choosing the right resolution is actually a tricky problem (e.g., as studied in [1906.06423] Fixing the train-test resolution discrepancy).

Finally, you might need to consider at at 1024x1024, the model will likely use ~(1024/224)**2 or ~9 times the memory and computation due to the quadratic scaling of convolution activations.

1 Like

the images are different in sizes >10001000 and I was feed 224224 sizes to the vgg16_bn model, the accuracy is 53%.
I think if I’ll change the image size input the accuracy will higher than that? or NOT?
and in code, I can’t change the input sizes of those images.
how I modify the model to receive the 1024*1024 images.
Thanks man

The important thing is that the scale of objects (e.g., how many pixels a cat or dog is) is close to what was shown during training time. So if you just change the input size without finetuning or retraining, the accuracy will most likely go down.

You don’t need to modify the model for higher resolution, but some kind of finetuning or retraining is probably needed.

1 Like