VGG feature extractor pretrained, with bigger input images – pytroch

How does pytorch init. the other paramters of a vgg16 feature extractor (no classifier), if the input images are much bigger as when pretrained?

So lets say I use 700 x 1200 colour images (not cropped) as input.
Create the vgg16-feature-extractor-model and load the pretrained values.

From my point of view the parameters for the standard vgg16 size are given (224 x 224).
But what happens with the parameters “outside” of this area – does pytorch initialize them …

  • randomly?
  • kind of ‘interpolation’ based on the pretrained values? ,

@ptrblck any idea? :slight_smile:

P.S. Is there any official source for this certain behaviour?

The classifier of VGG is defined here as:

        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(p=dropout),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(p=dropout),
            nn.Linear(4096, num_classes),
        )

and does not change the in_features depending on the input shape.
This might sound strange as a larger input should have more “features”.
However, torchvision classification models use adaptive pooling layers (as seen here) to create the desired activation output shape to avoid shape mismatch errors in the first linear layers. This allows you to use variable input shapes as long as the inputs are not too small, which would raise errors in e.g. conv or pooling layers.

@ptrblck
I think there is a misunderstanding.
I am only referring to the vgg16 feature_extractor itself (without the classifier).

So the Parameters for the weights of the pretrained weights are in orange (224 x 224).
grafik

What happens if my input is larger (blue area).
How are those parameters outside the orange (pretrained area) initialized?

So lets say I use 700 x 1200 colour images (not cropped) as input (blue are in the figure above)
Create the vgg16-feature-extractor-model and load the pretrained values (orange area = pretrained).

From my point of view the parameters for the standard vgg16 size are given (224 x 224) → orange area.
But what happens with the parameters “outside” of this area (blue area) – does pytorch initialize them …

randomly?
kind of ‘interpolation’ based on the pretrained values? ,
…

P.S. Is there any official source for this certain behaviour?

I think you there is a misunderstanding between the (input) activation shape and trainable parameters.
The weights of conv layers are typically small filter kernels with a spatial shape of 3x3 or 5x5.
The orange area represents the spatial input size and the conv layer will apply a convolution to it using the smaller kernel. The same logic applies to the blue box and the filter kernel will just have to “move around more” (of course internally conv layers don’t have to use a nested loop to shift the filter kernel.)

These animations might help.
E.g. take a look at the first animation. You can increase the size of the blue input which will allow the dark blue filter to be moved more creating a larger spatial output size.