# Efficient Computation of Distance Metrics Between Different Sized Tensors

Here’s the basic idea. I want to compute some distance metric (Euclidian, cosine, etc) between a vector and a matrix of vectors. In the case of a single vector, this is fairly straightforward.

``````a1 = torch.randn(1,5)
a2 = torch.randn(12,5)
distances = distance_function(a1.expand_as(a2),  a2) # returns vector of size 12
``````

However things get tricky in the batch case where the sizes of the second matrix can vary. For example:

``````a1 = torch.randn(1,5)
b1 = torch.randn(1,5)
c1 = torch.randn(1,5)

a2 = torch.randn(12,5)
b2 = torch.randn(7,5)
c2 = torch.randn(2,5)
``````

I’d like to compute all distance comparisons (a1 to a2, b1 to b2, c1 to c2, etc) in a way that is fast and memory efficient. Memory efficiency is important because sometimes the x_2 matrices can be large.

The first approach that comes to mind is something like this:

``````distances = distance_function(torch.cat([a1.expand_as(a2), b1.expand_as(b2), c1.expand_as(c2)]),
torch.cat([a2,b2,c2]))
``````

This works but I have concerns about the memory implications. I know `expand_as` does not allocate new memory. However (to my knowledge), using `torch.cat` will result in new memory allocation.

Is there a better/more efficient approach to this?

If you don’t have too many of these Tensors, I think a for loop around your original distance function above is the best If you have many many of them, then the cat operation will be faster (though more memory hungry as you mentionned).

1 Like