Setting margin in contrastive loss

Hi,
I’m trying to retrain siamese network with contrastive loss - I’ve pretrained the net for classification and then replaced classification fc layer with new fc layer of size 512.

However, the net seems not to learn at all. I suspect that this is caused by the margin in contrastive loss. Here I’ve learned that If I’ll L2 normalize output features I can set a constant margin and forget about training it. But it seems not to work. I’ve written it like this:

margin = 2
label_batch = (class_labels_1 != class_labels_2).to(device).float()  # 0: similar pair, 1: different pair
output1 = net(img_batch_1)
output2 = net(img_batch_2)
o1 = F.normalize(output1, p=2, dim=1)
o2 = F.normalize(output2, p=2, dim=1)
euclidean_distance = F.pairwise_distance(output1, output2)
loss_contrastive = torch.mean((1 - label_batch) * torch.pow(euclidean_distance, 2) +
                                          label_batch * torch.pow(torch.clamp(margin - euclidean_distance, min=0.0), 2))
loss_contrastive.backward()
optimizer.step()

Does it look OK or should I set margin otherwise?

By “the net seems not to learn at all” I mean that the training set loss very quickly drops (from 21 to 0.8) but the test set loss doesn’t change. For the training set mean values of distances between dissimilar pairs drops from ~2.2 to ~1.2. Why does this drops? Shouldn’t it go somewhere above 2.0 (above margin)?

did you find any answer for your question? I have the same problem…

So Hard or Semi-hard Negative Mining is the key, without it the net learned to correctly predict only easy examples. Also keep 1:1 ratio between positive and negative examples, or at least weight it in the loss (like param weight in i.e. nn.CrossEntropyLoss).

Also somehow on my machine the loss was calculated wrong with

loss_contrastive = torch.mean((1-label_batch) * torch.pow(euclidean_distance, 2) +
                                          label_batch * torch.pow(torch.clamp(margin - euclidean_distance, min=0.0), 2))

and this

loss_contrastive = torch.mean((1-label_batch) * torch.pow(euclidean_distance, 2) +
                                          (label_batch) * torch.pow(torch.clamp(margin - euclidean_distance, min=0.0), 2))

gave me the loss correctly. I don’t remember if I discovered the core problem of the parenthesis or didn’t have time for that. You can try it yourself - I think that this problem doesn’t occur for everyone.

For me all the three things worked.