I have a matrix M in size of n*d, where n is the number of vectors and d is the dimension of each vector. Now, I want to find the closest pair of this n vectors. Be note, I only need the closest pair.
I have two ways of doing this:
Use the PairwiseDistance and argmin function. However, this method uses too much memory. Half of the returned matrix is useless.
functional.pdist works well, it does not double computing. However, this function returns a condensed vector. I do not know how to recover the (i,j) index from this condensed vector.
The input matrix M changes and need to recompute the closest pair in every iteration. So, I want the whole process to be as efficient as possible.
In practice, for reasonable values of n and d, I expect that pdist will do about as well as you can.
However, the so-called computational cost of your pdist
solution is a suboptimal O(d * n^2). If you search on closest-pair problem, you will see that your problem can
be solved in O(d * n * log n) time. However, these
“faster” algorithms don’t vectorize naturally, so you give
up much of the benefit of using a gpu (and even pipelines
in a cpu). Thus your “naive” pdist solution may well run
faster (even as it requires more operations).