I am trying to build a small siamese network (with an aim to get encodings from the last/pre-last layer) and would like to use a pretrained model + the extra layers needed to get the encodings.
I have something like this at the moment, but the results dont look great, so I now wonder if this is the correct way to build off a pretrained model.
class PretrainedSiamese(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.pt_model = torch.hub.load('pytorch/vision', 'some_pretrained_model', pretrained=True)
for param in self.pt_model.parameters():
param.requires_grad = False
self.pt_model.classifier[1] = torch.nn.Linear(in_features=self.model_ft.classifier[1].in_features, out_features=2)
self.linear1 = nn.Linear(in_features=2, out_features=256, bias=True)
self.linear2 = nn.Linear(in_features=256, out_features=128, bias=True) # encoder layer
self.linear3 = nn.Linear(in_features=128, out_features=2, bias=True)
The forwrad I have is simply:
out = self.pt_model(out)
out = x.view(out.shape[0], -1)
out = self.linear1(out)
out = F.relu(out)
out = self.linear2(out)
..< do siamese i.e loop twice>
So, I pass the outputs (logits) from the pretrained classifier and attach a small network to this to get the encoding from the pre-final layer. Does this sound sensible? I am unsure now that I have terrible results