Extracting features from googlelenet pool 5 layer

Many papers use Google Lenet pool5 layer’s as features for images/video-frames. I need a feature vector whose dimension is 1024. I figured out the corresponding layer in PyTorch torchvision.models.googlelenet() is the AdaptiveAvgPool2d layer. However, the ouputs from this layer for one image are of shape [1, 1024, 1, 1]. Is this output correct? Can I simply drop last 2 dimensions for the feature vector?

model = models.googlenet(pretrained=True)
lenet = nn.Sequential(*list(model.children())[:-2])
lenet(torch.randn(1,3,224,224)).shape

torch.Size([1, 1024, 1, 1])

Yes, you could squeeze these dimensions and process your activation as [batch_size 1024].

2 Likes