Pairwise similarity matrix between a set of vectors

nullgeppetto · February 29, 2020, 1:37am

Let’s suppose that we have a 3D tensor, where the first dimension represents the batch_size, as follows:

import torch
import torch.nn as nn
x = torch.randn(32, 100, 25)

That is, for each i, x[i] is a set of 100 25-dimensional vectors. I would like to compute the similarity (e.g., the cosine similarity – but in general any such pairwise distance/similarity matrix) of these vectors for each batch item.

That is, for each x[i] I need to compute a [100, 100] matrix which will contain the pairwise similarities of the above vectors. More specifically, the (i,j)-th element of this matrix should contain the similarity (or the distance) between the i-th and the j-th row of (the 100x25) x[t], for all t=1, ..., batch_size.

If I use torch.nn.CosineSimilarity(), no matter what dim I’m using, the result is either [100, 25] (dim=0), or [32, 25] (dim=1) , where I need a tensor of size [32, 100, 100]. I would expect torch.nn.CosineSimilarity() to work this way (since, at least to me, it looks more intuitive), but it doesn’t.

Could that be done using something like below?

torch.matmul(x, x.permute(0, 2, 1))

I guess that this could give a distance matrix, but what if I need an arbitrary pairwise operation? Should I build this operation using the above?

Thank you.