VGG input issue

Erdi · August 15, 2019, 1:31am

Hi all

When using transfer learning with VGG (probably others), i saw lots of pytorch kernels out there starting with

''VGG-16 Takes 224x224 images as input, so we resize all of them ‘’

in the transformation section. But it seems to me this statement is a bit misleading. VGG has 7x7 Adaptive Adaptive Pooling before the classifier:
class VGG(nn.Module):

def __init__(self, features, num_classes=1000, init_weights=True):
    super(VGG, self).__init__()
    self.features = features
    self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
    self.classifier = nn.Sequential(
        nn.Linear(512 * 7 * 7, 4096),
        nn.ReLU(True),
        nn.Dropout(),
        nn.Linear(4096, 4096),
        nn.ReLU(True),
        nn.Dropout(),
        nn.Linear(4096, num_classes),
    )
    if init_weights:
        self._initialize_weights()

def forward(self, x):
    x = self.features(x)
    x = self.avgpool(x)
    x = torch.flatten(x, 1)
    x = self.classifier(x)
    return x

so no matter what input is given, it is rescaled to 512x7x7 before the linear layer. What do you guys think? Am I missing something? I found this point important because resizing from say 100x100 to 224x224 (resize 256 crop 224 etc) may easily pollute the model.

ptrblck · August 15, 2019, 12:27pm

The adaptive pooling layer relaxes the restriction of 224x224 inputs, but note that the kernels were trained using this resolution.
If your input size is quite different from the original used one, your performance might be worse.

Erdi · August 15, 2019, 9:12pm

thanks @ptrblck so the model will work for any input size but since the original model is trained with 224x224, probably we dont expect a high performance.

ptrblck · August 15, 2019, 9:24pm

Yes, that would be the basic assumption.
However, your model might still work fine, if the spatial size does not change very much or if you finetune it on the new size.