Bigger images than model input size

Dear Community,

I came across this phenomenon earlier this week:

import torch
import torchvision
model = torchvision.models.densenet121(pretrained=True)
model.eval()

im224s = torch.zeros(size=(1, 3, 224, 224))
im360s = torch.zeros(size=(1, 3, 360, 360))
im640 = torch.zeros(size=(1, 3, 640, 360))
im1920 = torch.zeros(size=(1, 3, 1920, 1080))

print("--- TEST: testing 224x224 ---")
x = model(im224s)
print("--- TEST: OK              ---")

print("--- TEST: testing 360x360 ---")
x = model(im360s)
print("--- TEST: OK              ---")

print("--- TEST: testing 640x360 ---")
x = model(im640)
print("--- TEST: OK              ---")

print("--- TEST: testing 1920x1080 ---")
x = model(im1920)
print("--- TEST: OK              ---")

Returns:

--- TEST: testing 224x224 ---
--- TEST: OK              ---
--- TEST: testing 360x360 ---
--- TEST: OK              ---
--- TEST: testing 640x360 --- 
--- TEST: OK              ---
--- TEST: testing 1920x1080 --- 
--- TEST: OK              ---

When I put images into my vanilla-pretrained-densenet121, that are bigger than what the actual input size should be (224x224), the model still returns an output.

I get that the input size doesn’t matter so much for the kernels and feature maps that can still be built, but the size of the feature map depends on the input image.
So if this works, using a bigger-than-expected image must lead to some form of information throw-away at least right before the fully connected layer in the end, which has fixed-size input.

So how are pytoch models handling over-sized images?

Cheers

tl;dr: the output volume of feature maps are calcuated like this:
Size=(Size_pre-Filtersize+2Padding)/(Stride+1), i.e. the size of a feature map is dependend on the size of the previous feature map and the filter-size. How is pytorch handling over-sized images? It does not throw an error.

Okay I solved the issue myself. In the forward function of the densenet one can see that an adaptive average pooling 2d is used. There, the sliding window for the pooling is calcuated in such a way that the result is (something x 1 x 1) no matter of the input size.

forward function of densenet:

    def forward(self, x):
        features = self.features(x)
        out = F.relu(features, inplace=True)
        out = F.adaptive_avg_pool2d(out, (1, 1))  # the magic happens here
        out = torch.flatten(out, 1)
        out = self.classifier(out)
        return out