Setting margin in contrastive loss

mmisiur · August 1, 2019, 1:22pm

Hi,
I’m trying to retrain siamese network with contrastive loss - I’ve pretrained the net for classification and then replaced classification fc layer with new fc layer of size 512.

However, the net seems not to learn at all. I suspect that this is caused by the margin in contrastive loss. Here I’ve learned that If I’ll L2 normalize output features I can set a constant margin and forget about training it. But it seems not to work. I’ve written it like this:

margin = 2
label_batch = (class_labels_1 != class_labels_2).to(device).float()  # 0: similar pair, 1: different pair
output1 = net(img_batch_1)
output2 = net(img_batch_2)
o1 = F.normalize(output1, p=2, dim=1)
o2 = F.normalize(output2, p=2, dim=1)
euclidean_distance = F.pairwise_distance(output1, output2)
loss_contrastive = torch.mean((1 - label_batch) * torch.pow(euclidean_distance, 2) +
                                          label_batch * torch.pow(torch.clamp(margin - euclidean_distance, min=0.0), 2))
loss_contrastive.backward()
optimizer.step()

Does it look OK or should I set margin otherwise?

By “the net seems not to learn at all” I mean that the training set loss very quickly drops (from 21 to 0.8) but the test set loss doesn’t change. For the training set mean values of distances between dissimilar pairs drops from ~2.2 to ~1.2. Why does this drops? Shouldn’t it go somewhere above 2.0 (above margin)?

isalirezag · December 1, 2019, 9:37pm

did you find any answer for your question? I have the same problem…

mmisiur · December 2, 2019, 9:49am

So Hard or Semi-hard Negative Mining is the key, without it the net learned to correctly predict only easy examples. Also keep 1:1 ratio between positive and negative examples, or at least weight it in the loss (like param weight in i.e. nn.CrossEntropyLoss).

Also somehow on my machine the loss was calculated wrong with

loss_contrastive = torch.mean((1-label_batch) * torch.pow(euclidean_distance, 2) +
                                          label_batch * torch.pow(torch.clamp(margin - euclidean_distance, min=0.0), 2))

and this

loss_contrastive = torch.mean((1-label_batch) * torch.pow(euclidean_distance, 2) +
                                          (label_batch) * torch.pow(torch.clamp(margin - euclidean_distance, min=0.0), 2))

gave me the loss correctly. I don’t remember if I discovered the core problem of the parenthesis or didn’t have time for that. You can try it yourself - I think that this problem doesn’t occur for everyone.

For me all the three things worked.

avisek_lahiri · December 4, 2023, 7:05am

So, before calculating the Euclidean distance, did you make sure to unit normalize the vectors ?
For example, I have 512D vectors coming out from a Relu layer.
Do you suggest to unit normalize the vector ?

Also, what range of margins you tried ?
Something within [0, 1] ?