Is there a loss function that measures the cross similarity between two 2D tensors?

Aldebaran · June 2, 2020, 12:26pm

Given two input tensors x1 and x2 with the shape [batch_size, hidden_size], let S be the matrix of similarity between all pairs (predict, target), where predict and target are dense vectors with the shape [hidden_size] and predict belongs to x1 and target belongs to x2.

Is there any loss function that is minimized as the values in the diagonal of S are close to 1 while the other values are close to -1?

That is, the similarity between the vectors from x1 and x2 of the same index must be greater than between vectors of different indexes.

albanD · June 2, 2020, 2:21pm

Hi,

I don’t think there is one that does that.
But you can use .cdist() to compute all these distance at once. So it should be fairly simple to do one yourself.

Aldebaran · June 2, 2020, 3:16pm

Currently, I’ve implemented the following solution inpired by N-Pair Loss published from NIPS 2016:

import torch
from torch import nn
from matplotlib import pyplot as plt
import seaborn as sn


class NPairsLoss(nn.Module):
    """
    The N-Pairs Loss.
    It measures the loss given predicted tensors x1, x2 both with shape [batch_size, hidden_size],
    and target tensor y which is the identity matrix with shape  [batch_size, batch_size].
    """

    def __init__(self):
        super(NPairsLoss, self).__init__()
        self.ce = nn.CrossEntropyLoss()

    def show(self, similarity_scores):
        sn.heatmap(similarity_scores.detach().numpy(), annot=True, annot_kws={'size': 7}, vmin=-1.0, vmax=1.0)
        plt.show()

    def similarities(self, x1, x2):
        """
        Calculates the cosine similarity matrix for every pair (i, j),
        where i is an embedding from x1 and j is another embedding from x2.

        :param x1: a tensors with shape [batch_size, hidden_size].
        :param x2: a tensors with shape [batch_size, hidden_size].
        :return: the cosine similarity matrix with shape [batch_size, batch_size].
        """
        x1 = x1 / torch.norm(x1, dim=1, keepdim=True)
        x2 = x2 / torch.norm(x2, p=2, dim=1, keepdim=True)
        return torch.matmul(x1, x2.t())

    def forward(self, predict, target):
        """
        Computes the N-Pairs Loss between the target and predictions.
        :param predict: the prediction of the model,
        Contains the batches x1 (image embeddings) and x2 (description embeddings).
        :param target: the identity matrix with shape  [batch_size, batch_size].
        :return: N-Pairs Loss value.
        """
        x1, x2 = predict
        predict = self.similarities(x1, x2)
        self.show(predict)
        # by construction the probability distribution must be concentrated on the diagonal of the similarities matrix.
        # so, Cross Entropy can be used to measure the loss.
        return self.ce(predict, target)

However, with this loss, the model ends up converging to a scenario where all dense vectors are equal to each other. Which can be seen by executing the following code snippet:

batch_size=7
hidden_size=768
def m_model(scenario=0):
    if scenario == 0: # all equal all
        p1 = torch.ones((batch_size, hidden_size)) 
        p2 = p1
    elif scenario == 1: # all different all
        p1 = torch.ones((batch_size, hidden_size))
        p2 = -1*p1
    else: # desired case
        p1 = torch.rand((batch_size, hidden_size))
        p2=p1

    return p1, p2

predict = m_model(scenario=0)
target = torch.arange(batch_size)
loss = NPairsLoss(1)

print("Loss:", loss(predict, target))
# Loss: tensor(1.9459), using scenario=0
# Loss: tensor(1.9459), using scenario=1
# Loss: tensor(1.7364), using scenario=2

Any suggestions on how to penalize these scenarios where the similarity matrix has all the same values?