# Custom loss function pollutes with NaNs after some time training

I wrote a custom vector similarity loss function as I wanted to experiment with different vector similarity heuristics. This is the class:

``````class CosineLoss(torch.nn.Module):
'''
Loss calculated on the cosine distance between batches of vectors:
loss = 1 - label * a.b / (|a|*|b|)
'''

def __init__(self):
super(CosineLoss, self).__init__()

def cosine_similarity(self, mat1, mat2):
return mat1.unsqueeze(1).bmm(mat2.unsqueeze(2)).squeeze() / \
(torch.norm(mat1, 2, 1) * torch.norm(mat2, 2, 1))

def forward(self, input_tensor, target_tensor, labels):
sim = self.cosine_similarity(input_tensor, target_tensor)
loss = (1.0 - labels * sim).sum() / labels.size(0)
return loss
``````

This has very similar behaviour to `nn.CosineEmbeddingLoss`: it takes two tensors and a set of labels, and calculates a positive or negative similarity loss depending on the labels’ sign. One difference is I have not used a margin (equivalent to margin = 0 in `nn.CosineEmbeddingLoss`). On two batches of vectors `enc` and `dec`, the loss calculation is:

``````self.error_f = CosineLoss()
loss = self.error_f(enc, dec, labels) + \
self.error_f(enc, dec[torch.randperm(batch_size)], -labels)
``````

Here, I use the ground truth batch as a positive batch, and a shuffled batch as the negative batch (to avoid the easy minimum of zero valued parameters). I am able to train successfully with this loss and begin to converge, but after some time (30-40 epochs on a small dataset) the loss seems to pollute with NaNs when calculating the negative batch loss (the second term above).

Using the cosine loss from the nn library I am able to train without NaNs. However I don’t see anything immediate wrong with my implementation.

Is there some trick I have missed that was used when implementing `nn.CosineEmbeddingLoss`?

Does adding en epsilon in your `cosine_similarity` function when you divide by the norms help? These norms can go to 0 during training and would result to NaN values.

1 Like

@albanD adding an epsilon to the norms worked like a charm.

Thanks for the tip, great help!