I need to extract features for the images before classifying them… or removing the last layer for the classification model using vit-PyTorch
I tried to ignore the classification layer by self.mlp_head = nn.Identity()
then do that code
from max_vit import MaxViT
from extractor import Extractor
model = MaxViT(
num_classes = 0,
dim = 192,
depth = (2, 6, 14, 2)
)
feature= model(img)
but got the shape for each image torch.Size([1, 1536, 7, 7])
and got the file size 300MB for 1000 images ! … is there anything wrong with the code, please
The feature dimension looks reasonable as the mlp_head would not be applied as seen here. I haven’t checked the expected shape as the einops reductions don’t show the actual values, but I would expect to see this or a similar shape.
Could you explain what the issue is or why this shape would not be expected?
It’s unclear to me where you are currently stuck. You didn’t follow up from my previous post but are asking for help again so I guess you are hitting a different issue?
I wrote more details about the problem but after time I feel disappointed to find a solution so I deleted it … if you can help I post it again … hope to find help … Thanks Problem with extracting the feature - #4 by mathwseg
Your model seems to be overfitting on the training set and I don’t think that your feature extraction is necessarily wrong.
Overfitting can have different reasons, e.g. the model capacity might be too large for the given data and your model is thus able to easily learn all training samples.
It depends which features you want to use. Initially, you’ve replaced the entire mlp_head with an nn.Identity layer, now you are using the Reduce layer. Both sound reasonable as they are applied before the final linear layer which would act as the classifier.