Problem with extracting the feature

I need to extract features for the images before classifying them… or removing the last layer for the classification model using vit-PyTorch

I tried to ignore the classification layer by
self.mlp_head = nn.Identity()

then do that code

from max_vit import MaxViT
from extractor import Extractor
model = MaxViT(
    num_classes = 0,
    dim = 192,                        
    depth = (2, 6, 14, 2)
feature= model(img)

but got the shape for each image torch.Size([1, 1536, 7, 7])
and got the file size 300MB for 1000 images ! … is there anything wrong with the code, please

can someone help please

The feature dimension looks reasonable as the mlp_head would not be applied as seen here. I haven’t checked the expected shape as the einops reductions don’t show the actual values, but I would expect to see this or a similar shape.
Could you explain what the issue is or why this shape would not be expected?

is there any help please?

It’s unclear to me where you are currently stuck. You didn’t follow up from my previous post but are asking for help again so I guess you are hitting a different issue?

I wrote more details about the problem but after time I feel disappointed to find a solution so I deleted it … if you can help I post it again … hope to find help … Thanks Problem with extracting the feature - #4 by mathwseg

Your model seems to be overfitting on the training set and I don’t think that your feature extraction is necessarily wrong.
Overfitting can have different reasons, e.g. the model capacity might be too large for the given data and your model is thus able to easily learn all training samples.

doe the code i wrote for extracting the feature is right, please?

model = MaxViT(
    num_classes = 0,
    dim = 128,                         
    depth = (2, 6, 14, 2) 

model.mlp_head = model.mlp_head[0]

from this class vit-pytorch/ at main · lucidrains/vit-pytorch · GitHub

It depends which features you want to use. Initially, you’ve replaced the entire mlp_head with an nn.Identity layer, now you are using the Reduce layer. Both sound reasonable as they are applied before the final linear layer which would act as the classifier.

Appreciate your reply so both ways means the features before classification or there is something i should do it else

I think both are valid approaches to extract the features from the model and you would need to check which ones would work better for your use case.

1 Like