How can i get features map for images?

samm · June 13, 2021, 9:17pm

As i got from the concept of captioning that i need the features for the images not to classify it. So if i have a classification model, the classification head can be dropped. does that code doing the concept ? the code here

the modifications are

def gradient(model, freeze: bool):
transformer = VisionTransformer(num_classes=0)
for parameter in transformer.parameters():
parameter.requires_grad_(not freeze)

def vit_small(patch_size=16, **kwargs):
model = VisionTransformer( patch_size=patch_size, embed_dim=384, depth=12, num_heads=6, mlp_ratio=4, qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs) gradient(model, freeze=True)
return model

Does that right ?

Does after that i need re -train the model for my datasets after the modifications ?

What should result expected ?

eqy · June 13, 2021, 9:26pm

Usually if you want to get feature maps from a model the typical approach is to edit the forward function in the model definition to return the intermediate feature maps in addition to the final output: e.g., How to extract features of an image from a trained model - #6 by fmassa

Usually some kind of finetuning would be needed (at least), as classification is kind of a different domain from captioning.

samm · June 13, 2021, 9:33pm

excuse me and thanks for replying … if i have a detection model and need to use it for captioning . Does i need freezing any layer of the model or the freezing is just for classification model ?

eqy · June 13, 2021, 9:34pm

I don’t think there is a strict answer for this because it depends on what the entire model architecture is and not just the backbone. So you can try training with or without freezing though without freezing probably gives you better results.