Triplet loss doesn't converge

I’m doing a classification task with a training set of 20000 images over 1000 labels. I’m using Alex Net and triplet loss. The problem is that the loss usually stucks at the margin of triplet loss. I tried to adjust the learning rate from 0.01 to 0.000001 and momentum from 0.9 to 0.0009. Once it worked, the loss tends to converge to zero. But most of the time it doesn’t work even if I use the same setting as the time is worked. Can anyone tell me what shall I do?

Here’s the settings of my model.

model_color = AlexNet(3)
criterion = TripletLoss(margin=1.0)
optimizer_color = optim.SGD(model_color.parameters(), lr=0.00001, momentum=0.009)

I use online triplet generating method with batch hard strategy.

Here’s my training code. all_anchors_ contains all images in the current batch and get_positive will return the hardest positive and similar as get_negative.

for index, anchor in enumerate(all_anchors_):
    positive = get_positive(anchor)
    negative = get_negative(anchor)


    a_output = model(anchor)
    p_output = model(torch.stack(positive))
    n_output = model(torch.stack(negative))

    loss = criterion(a_output, p_output, n_output)


    running_loss +=
    output =

I examine the and find that usually all of them stuck at 1, which is the margin of the triplet loss.

Hi there. Hmm, I’ll try to help you but it’s really hard to answer the “why doesn’t it work” questions without code.

What do you mean by this? You set your magin to e.g. 1.5 and the loss becomes 1.5?

1 Like

Exactly. Also I updated my question. Many thanks for your reply.

Sorry for the delayed answer, was out of town.

Hmmm, so when I use triplet loss, the loss (with the default reduction of ‘mean’) ends up way lower than my margin value. This seems weird to me.

I’m doing a classification task with a training set of 20000 images over 1000 labels

I don’t understand this. Do you mean that you have 20 000 images and each one of those images can correspond to one out of a 1000 classes?

I don’t understand why you set your number of classes in AlexNet to 3. That means that you’d only have three output nodes. To the best of my knowledge, you most often use a triplet loss for when you have a dynamic number of classes - to avoid retraining for every new class. This works better if you up the number of output nodes to e.g. 1000. At prediction time you measure the similarity between images with e.g. nn.PairwiseDistance(p=2) to attribute an image with a class

Oh and btw, you probably want to skip using the .data attribute. You can get the value with the .item() or .numpy() on the tensor. I don’t remember why, but .data is considered dirty or messes with the gradients or something

Did you solved your problem? I am facing the same problem.