How to get embeddings from a pretrained model

I am trying to write a Siamese network of two embedding networks that share weights. I found several examples online for MNIST and other datasets of small images, but my images are much larger (around 500x1000). Please bear with me because I’m very confused, and I have several questions here!

  1. How should I modify the embedding networks to take in larger images? Should I use a model made for larger images, like ResNet?
    1a) And if I do that, should I resize my images to fit ResNet’s input size? Or is there a way to modify ResNet to work with larger images?

  2. If I use a pretrained network, how should I modify it so that it still works with my dataset? (Do I train starting from the pretrained weights? Or do I just freeze those weights and work with that?

  3. Do I need to remove the output layer of the model in order to get embeddings?

  4. How would I incorporate all these things into my Siamese network code?

1 Like
  1. You could use another model architecture or modify your current model to accept later images. It’s hard to tell which approach would work better and you would have to run some experiments.
    The torchvision.models.resnet... implementations accept larger input images, since an adaptive pooling layer is used before flattening the activation and feeding into the first linear layer of the classifier.

  2. You could test both approaches for your use case. The best strategy depends on your current use case and used dataset. If your dataset is “similar” to ImageNet, freezing the layers might work fine.

  3. I’m not completely sure what “embeddings” are in the posted model, but given your current implementation I would assume you want to reuse the resnet for both inputs and then add a custom linear layer to get the final output.

  4. Try to replace the self.conv module with e.g. the modified resnet.