Many papers use Google Lenet pool5 layer’s as features for images/video-frames. I need a feature vector whose dimension is 1024. I figured out the corresponding layer in PyTorch torchvision.models.googlelenet()
is the AdaptiveAvgPool2d
layer. However, the ouputs from this layer for one image are of shape [1, 1024, 1, 1]. Is this output correct? Can I simply drop last 2 dimensions for the feature vector?
model = models.googlenet(pretrained=True)
lenet = nn.Sequential(*list(model.children())[:-2])
lenet(torch.randn(1,3,224,224)).shape
torch.Size([1, 1024, 1, 1])