I am trying to build a small siamese network (with an aim to get encodings from the last/pre-last layer) and would like to use a pretrained model + the extra layers needed to get the encodings.
I have something like this at the moment, but the results dont look great, so I now wonder if this is the correct way to build off a pretrained model.
class PretrainedSiamese(nn.Module): def __init__(self): super(Net, self).__init__() self.pt_model = torch.hub.load('pytorch/vision', 'some_pretrained_model', pretrained=True) for param in self.pt_model.parameters(): param.requires_grad = False self.pt_model.classifier = torch.nn.Linear(in_features=self.model_ft.classifier.in_features, out_features=2) self.linear1 = nn.Linear(in_features=2, out_features=256, bias=True) self.linear2 = nn.Linear(in_features=256, out_features=128, bias=True) # encoder layer self.linear3 = nn.Linear(in_features=128, out_features=2, bias=True)
The forwrad I have is simply:
out = self.pt_model(out) out = x.view(out.shape, -1) out = self.linear1(out) out = F.relu(out) out = self.linear2(out) ..< do siamese i.e loop twice>
So, I pass the outputs (logits) from the pretrained classifier and attach a small network to this to get the encoding from the pre-final layer. Does this sound sensible? I am unsure now that I have terrible results