Many papers use Google Lenet pool5 layer’s as features for images/video-frames. I need a feature vector whose dimension is 1024. I figured out the corresponding layer in PyTorch
torchvision.models.googlelenet() is the
AdaptiveAvgPool2d layer. However, the ouputs from this layer for one image are of shape [1, 1024, 1, 1]. Is this output correct? Can I simply drop last 2 dimensions for the feature vector?
model = models.googlenet(pretrained=True) lenet = nn.Sequential(*list(model.children())[:-2]) lenet(torch.randn(1,3,224,224)).shape
torch.Size([1, 1024, 1, 1])