Using pretrained VGG-16 to get a feature vector from an image

Hi,
I want to get a feature vector out of an image by passing the image through a pre-trained VGG-16. I used the pretrained Resnet50 to get a feature vector and that worked perfectly. But when I use the same method to get a feature vector from the VGG-16 network, I don’t get the 4096-d vector which I assume I should get. I got the code from a variety of sources and it is as follows:

vgg16_model=models.vgg16(pretrained=True)
modules=list(vgg16_model.children())[:-1]
vgg16_model=nn.Sequential(*modules)

data=moveaxis(data,2,0)
img_var=Variable(torch.from_numpy(data).unsqueeze(0)).float()
features_var=vgg16_model(img_var)
features=features_var.data
features=features.data.numpy()
print(features.shape)

The variable “data” is an image numpy array of dimensions (300, 400, 3)
Hence I use the move axis to jumble the axis so that I have 3 channels and not 300.
The output(features.shape) which I get is : (1, 512, 7, 7)
I want a 4096-d vector as the VGG-16 gives before the softmax layer.
I even tried the list(vgg16_model.classifier.children())[:-1] approach but that did not go too well too. There are a lot of discussions about this but none of them worked for me. Let me know where I might be going wrong… Thank you!

Hi,

For vgg-16 available in torchvision.models when you call list(vgg16_model.children())[:-1] it will remove whole nn.Sequential defined as following:

Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace=True)
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=4096, out_features=1000, bias=True)

So it will also remove layer generating your feature vector (4096-d). You have to remove layers from nn.Sequential block given above. Like,

vgg16_model = models.vgg16(pretrained=True)
vgg16_model.classifier = vgg16_model.classifier[:-1] # vgg16_model.classifier is nn.Sequential block.
3 Likes

Thanks for the reply Yash…
But unfortunately, this doesn’t work too…
My modified code is :

vgg16_model=models.vgg16(pretrained=True)
vgg16_model_classifier=vgg16_model.classifier[:-1]
img_var=Variable(torch.from_numpy(data).unsqueeze(0)).float()

features_var=vgg16_model_classifier(img_var)

features=features_var.data
features=features.data.numpy()

print(features.shape)

Now it throws a size mismatch error…
Thanks…

1 Like

Hi,

There seems to be a mistake in your code:
change

vgg16_model_classifier=vgg16_model.classifier[:-1]

To

vgg16_model.classifier = vgg16_model.classifier[:-1]

And pass your image to vgg16_model.

Oh, that’s awesome! It worked! Thanks a lot @yash1994 !

@yash1994
Hi,
If I have the following image array :

arr=np.random.rand(300,400,3)

And I do the following to it :

temp_obs=moveaxis(arr,2,0)
img_var=Variable(torch.from_numpy(temp_obs).unsqueeze(0)).float()

And then pass it to the vgg model like :

features_var=vgg16_model(img_var)

I get a numpy array full of zeros.
Would you know why?

Thanks…

I even tried declaring the VGG model as follows but it doesn’t work too. This one gives dimensionality errors :

vgg16_model=models.vgg16(pretrained=True)
# vgg16_model.classifier=vgg16_model.classifier[:-1]
modules_vgg=list(vgg16_model.classifier[:-1])
vgg16_model=nn.Sequential(*modules_vgg)

Hi,

You need to put the model in inferencing model with model.eva() function to turn off the dropout/batch norm before extracting the feature. And try extracting features with an actual image with imagenet class.

This will result in dimension error because you are re-defining model as following:

Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace=True)
  (5): Dropout(p=0.5, inplace=False)

so this expects flat input of 25088 dimensional array.

Okay…
@yash1994 I just added the model.eval() in the code and then tried to extract features but still an array of zeros…
I also tried passing a real image of dimensions 300x400x3. Do you think that is a problem?

Thanks…

Actually I just iterated over the entire array and saw that not all values are zeros. But there are quite a few which are zero. I don’t understand why they are zeros though…

Just take two images of a bus (an imagenet class) from google images, extract feature vector and compute cosine similarity. if cosine similarity is good and those feature vector are similar then there is no problem, otherwise there is some issue.

Okay! That makes sense… Thank you very much…