Is there a loss function that measures the cross similarity between two 2D tensors?

Given two input tensors x1 and x2 with the shape [batch_size, hidden_size], let S be the matrix of similarity between all pairs (predict, target), where predict and target are dense vectors with the shape [hidden_size] and predict belongs to x1 and target belongs to x2.

enter image description here

Is there any loss function that is minimized as the values in the diagonal of S are close to 1 while the other values are close to -1?

That is, the similarity between the vectors from x1 and x2 of the same index must be greater than between vectors of different indexes.

Hi,

I don’t think there is one that does that.
But you can use .cdist() to compute all these distance at once. So it should be fairly simple to do one yourself.

Currently, I’ve implemented the following solution inpired by N-Pair Loss published from NIPS 2016:

import torch
from torch import nn
from matplotlib import pyplot as plt
import seaborn as sn


class NPairsLoss(nn.Module):
    """
    The N-Pairs Loss.
    It measures the loss given predicted tensors x1, x2 both with shape [batch_size, hidden_size],
    and target tensor y which is the identity matrix with shape  [batch_size, batch_size].
    """

    def __init__(self):
        super(NPairsLoss, self).__init__()
        self.ce = nn.CrossEntropyLoss()

    def show(self, similarity_scores):
        sn.heatmap(similarity_scores.detach().numpy(), annot=True, annot_kws={'size': 7}, vmin=-1.0, vmax=1.0)
        plt.show()

    def similarities(self, x1, x2):
        """
        Calculates the cosine similarity matrix for every pair (i, j),
        where i is an embedding from x1 and j is another embedding from x2.

        :param x1: a tensors with shape [batch_size, hidden_size].
        :param x2: a tensors with shape [batch_size, hidden_size].
        :return: the cosine similarity matrix with shape [batch_size, batch_size].
        """
        x1 = x1 / torch.norm(x1, dim=1, keepdim=True)
        x2 = x2 / torch.norm(x2, p=2, dim=1, keepdim=True)
        return torch.matmul(x1, x2.t())

    def forward(self, predict, target):
        """
        Computes the N-Pairs Loss between the target and predictions.
        :param predict: the prediction of the model,
        Contains the batches x1 (image embeddings) and x2 (description embeddings).
        :param target: the identity matrix with shape  [batch_size, batch_size].
        :return: N-Pairs Loss value.
        """
        x1, x2 = predict
        predict = self.similarities(x1, x2)
        self.show(predict)
        # by construction the probability distribution must be concentrated on the diagonal of the similarities matrix.
        # so, Cross Entropy can be used to measure the loss.
        return self.ce(predict, target)

However, with this loss, the model ends up converging to a scenario where all dense vectors are equal to each other. Which can be seen by executing the following code snippet:

batch_size=7
hidden_size=768
def m_model(scenario=0):
    if scenario == 0: # all equal all
        p1 = torch.ones((batch_size, hidden_size)) 
        p2 = p1
    elif scenario == 1: # all different all
        p1 = torch.ones((batch_size, hidden_size))
        p2 = -1*p1
    else: # desired case
        p1 = torch.rand((batch_size, hidden_size))
        p2=p1

    return p1, p2

predict = m_model(scenario=0)
target = torch.arange(batch_size)
loss = NPairsLoss(1)

print("Loss:", loss(predict, target))
# Loss: tensor(1.9459), using scenario=0
# Loss: tensor(1.9459), using scenario=1
# Loss: tensor(1.7364), using scenario=2

Any suggestions on how to penalize these scenarios where the similarity matrix has all the same values?