Efficient Computation of Distance Metrics Between Different Sized Tensors

Here’s the basic idea. I want to compute some distance metric (Euclidian, cosine, etc) between a vector and a matrix of vectors. In the case of a single vector, this is fairly straightforward.

a1 = torch.randn(1,5)
a2 = torch.randn(12,5)
distances = distance_function(a1.expand_as(a2),  a2) # returns vector of size 12

However things get tricky in the batch case where the sizes of the second matrix can vary. For example:

a1 = torch.randn(1,5)
b1 = torch.randn(1,5)
c1 = torch.randn(1,5)

a2 = torch.randn(12,5)
b2 = torch.randn(7,5)
c2 = torch.randn(2,5)

I’d like to compute all distance comparisons (a1 to a2, b1 to b2, c1 to c2, etc) in a way that is fast and memory efficient. Memory efficiency is important because sometimes the x_2 matrices can be large.

The first approach that comes to mind is something like this:

distances = distance_function(torch.cat([a1.expand_as(a2), b1.expand_as(b2), c1.expand_as(c2)]),
                    torch.cat([a2,b2,c2]))

This works but I have concerns about the memory implications. I know expand_as does not allocate new memory. However (to my knowledge), using torch.cat will result in new memory allocation.

Is there a better/more efficient approach to this?

If you don’t have too many of these Tensors, I think a for loop around your original distance function above is the best :slight_smile:

If you have many many of them, then the cat operation will be faster (though more memory hungry as you mentionned).

1 Like