Hi, I am playing with the pre-trained Resnet101 in torchvision. I tried different input size of images (224x224, 336x336, 224x336) and it seem all works well. So what’s the exact valid range of input size to send into the pre-trained ResNet?
I think the valid input size of images is 224224. May be you are using preprocessing in your code and whatever the input size of the image is, it crop as just 224224. Thanks.
I printed modules in the ResNet and found why:
The AvgPool before the last FC is like this:
(avgpool): AvgPool2d (size=7, stride=7, padding=0, ceil_mode=False, count_include_pad=True)
Therefore as long as the input image size makes the AvgPool output tensors of size 1x2048x1x1, there is no problem. But if the input size is not 224x224, it is cropped by ResNet implicitly at AvgPool layer.
No the resnet18 model architecture itself has AdaptiveAvgPool2d layer at the end of it. This layer ensures that any size of the input image get converted to a fixed size output, so that varying input sizes would not be a problem for the dense layers.