# What is the most efficient way to get the closest pair of a set of vectors?

I have a matrix M in size of n*d, where n is the number of vectors and d is the dimension of each vector. Now, I want to find the closest pair of this n vectors. Be note, I only need the closest pair.
I have two ways of doing this:

1. Use the `PairwiseDistance` and `argmin` function. However, this method uses too much memory. Half of the returned matrix is useless.
2. `functional.pdist` works well, it does not double computing. However, this function returns a condensed vector. I do not know how to recover the (i,j) index from this condensed vector.

The input matrix M changes and need to recompute the closest pair in every iteration. So, I want the whole process to be as efficient as possible.

Any suggestions about solving this efficiently?

I solved this by recovering the `(i,j)` index from the `k` index in the `functional.pdist`.
The code:

``````def get_index(k, n):
p = n-1
kk = k
j = 1
i = 0
while kk >= p:
kk = kk-p
p = p-1
j += 1
i += 1
return i, j+kk

``````

Hi Fly!

In practice, for reasonable values of `n` and `d`, I expect that
`pdist` will do about as well as you can.

However, the so-called computational cost of your `pdist`
solution is a suboptimal `O(d * n^2)`. If you search on
closest-pair problem, you will see that your problem can
be solved in `O(d * n * log n)` time. However, these
“faster” algorithms don’t vectorize naturally, so you give
up much of the benefit of using a gpu (and even pipelines
in a cpu). Thus your “naive” `pdist` solution may well run
faster (even as it requires more operations).

Good luck.

K. Frank

1 Like