How can i get features map for images?

Usually if you want to get feature maps from a model the typical approach is to edit the forward function in the model definition to return the intermediate feature maps in addition to the final output: e.g., How to extract features of an image from a trained model - #6 by fmassa

Usually some kind of finetuning would be needed (at least), as classification is kind of a different domain from captioning.