Efficient Computation of Distance Metrics Between Different Sized Tensors

kheyer · May 14, 2020, 5:29pm

Here’s the basic idea. I want to compute some distance metric (Euclidian, cosine, etc) between a vector and a matrix of vectors. In the case of a single vector, this is fairly straightforward.

a1 = torch.randn(1,5)
a2 = torch.randn(12,5)
distances = distance_function(a1.expand_as(a2),  a2) # returns vector of size 12

However things get tricky in the batch case where the sizes of the second matrix can vary. For example:

a1 = torch.randn(1,5)
b1 = torch.randn(1,5)
c1 = torch.randn(1,5)

a2 = torch.randn(12,5)
b2 = torch.randn(7,5)
c2 = torch.randn(2,5)

I’d like to compute all distance comparisons (a1 to a2, b1 to b2, c1 to c2, etc) in a way that is fast and memory efficient. Memory efficiency is important because sometimes the x_2 matrices can be large.

The first approach that comes to mind is something like this:

distances = distance_function(torch.cat([a1.expand_as(a2), b1.expand_as(b2), c1.expand_as(c2)]),
                    torch.cat([a2,b2,c2]))

This works but I have concerns about the memory implications. I know expand_as does not allocate new memory. However (to my knowledge), using torch.cat will result in new memory allocation.

Is there a better/more efficient approach to this?

albanD · May 14, 2020, 5:33pm

If you don’t have too many of these Tensors, I think a for loop around your original distance function above is the best

If you have many many of them, then the cat operation will be faster (though more memory hungry as you mentionned).