Soft Margin Ranking Triplet Loss

eaah · April 19, 2023, 12:58am

Hi,

I’ve Implemented the following loss function. But noted on my last training that this is the reason for my loss to be NaN. When I changed the loss function to a hard triplet margin loss the network started training with no issue. Would you please help me point where can this loss be wrong?

class SoftMarginRankingLoss(torch.nn.Module):
    def __init__(self, weight=None, size_average=True, alpha =1 ):
        super(SoftMarginRankingLoss, self).__init__()
        
        self.apha = alpha
        
    def forward(self, anchor, positive_match, negative_match):
        # Calculate the euclidean distance between anchor and positive
        distance_pos = torch.square(F.pairwise_distance(anchor, positive_match, keepdim = True))
        # Calculate the euclidean distance between anchor and negative
        distance_neg = torch.square(F.pairwise_distance(anchor, negative_match, keepdim = True))
        
        distance = distance_pos - distance_neg
        loss = torch.log(1+torch.exp(self.apha*distance))
        
        # mean is required as the backward() function expects a scalar value for the loss
        return loss.mean()

ptrblck · April 19, 2023, 2:12am

I don’t know which values distance is expected to have, but note that you could easily create an overflow creating an Inf loss and thus NaN gradients:

distance = torch.tensor(100., requires_grad=True)

loss = torch.log(1+torch.exp(distance))
print(loss)
# tensor(inf, grad_fn=<LogBackward0>)

loss.backward()
print(distance.grad)
# tensor(nan)

eaah · April 19, 2023, 8:07am

Thanks so much for pointing that out. Is there a way to avoid this of happening?

hrbigelow · April 22, 2023, 8:34pm

Hi,

Maybe instead of loss = torch.log(1+torch.exp(self.apha*distance)), you could use torch.logsumexp but stack in a dummy value of 0 alongside the quantity self.apha*distance. This then exponentiates to 1, thus mathematically giving you log(e^0 + e^{apha*distance}) = log(1 + e^{apha*distance}).

or, it looks like you could hack together using torch.logaddexp(apha*distance, torch.zeros(...))