Hi all
When using transfer learning with VGG (probably others), i saw lots of pytorch kernels out there starting with
''VGG-16 Takes 224x224 images as input, so we resize all of them ‘’
in the transformation section. But it seems to me this statement is a bit misleading. VGG has 7x7 Adaptive Adaptive Pooling before the classifier:
class VGG(nn.Module):
def __init__(self, features, num_classes=1000, init_weights=True):
super(VGG, self).__init__()
self.features = features
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)
if init_weights:
self._initialize_weights()
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
so no matter what input is given, it is rescaled to 512x7x7 before the linear layer. What do you guys think? Am I missing something? I found this point important because resizing from say 100x100 to 224x224 (resize 256 crop 224 etc) may easily pollute the model.