I have a data set which looks like following:

```
Image_1 Image_2 Image_3
A B C
... ... ...
```

If image A is similar to B, it will be assigned with label 1, otherwise with label 0.

I first use pre-trained resnet18 to extract features for each RGB image and get a 1000 dimensional vector. Then I build a deep network and use triplet as loss function to train model, there is a part of my code:

```
class Network(torch.nn.Module):
def __init__(self, n_feature = 1000, n_hidden_1,n_output = 10):
super(Network, self).__init__()
self.net = torch.nn.Sequential(
torch.nn.Linear(n_feature, n_hidden_1),
torch.nn.BatchNorm1d(n_hidden_1),
torch.nn.ReLU(),
torch.nn.Linear(n_hidden_1, n_output)
)
def forward(self, x):
x = self.net(x)
return x
```

Training step:

```
for step, (batch_anchor, batch_positive, batch_negative )in enumerate(train_loader):
optimizer.zero_grad()
anchor_out = model(batch_anchor)
positive_out = model(batch_positive)
negative_out = model(batch_negative)
loss = loss_func(anchor_out, positive_out, negative_out)
loss.backward()
optimizer.step()
```

where I define loss function and optimiser with:

```
optimizer = optim.Adam(model.parameters(), lr=0.002)
loss_func = torch.nn.TripletMarginLoss()
```

After the training process is done, I test this network with validation set:

```
with torch.no_grad():
anchor_out_val = model(val_data_anchor).numpy()
positive_out_val = model(val_data_positive).numpy()
negative_out_val = model(val_data_negative).numpy()
```

Now I use L2 Norm to measure similarity and assign labels, this works very well in validation set and I got accuracy 80% measured by accuracy_score from sklearn. But when I try it with test set, I only get 50% accuracy. Maybe someone could tell me why? Is the metric to measure similarity not good? or maybe it is the problem from network?