How to achieve efficient greedy Hamiltonian path construction?

I have N C-dimensional points to form a tensor Data with a shape of (C, N). Given a starting point with its index in Data (e.g. 0), I want to implement an efficient greedy Hamiltonian path construction algorithm as follows:

1. Initialize a list S = [], and the remaining set R = {all N data points};
2. Pick the starting point from R to S;
3. Using a loop to pick points one-by-one from R to S, in each round,
find the nearest point in R from the last point in S (measured by L2
distance), move it out from R and append to S;
4. When R is empty, return S and end.

where each point is represented by its corresponding index in Data (i.e. a number in {0,1, …, N-1}), and the returned S is actually a permutation which can be applied on Data to get the ordered result. :sweat_smile:

It’s relatively easy to implement the above algorithm by for-loops, but I am pursuing a more efficient way based on PyTorch. The trivial implementation could be with O(N^2) time complexity, and I’m wondering if this can be achieved with O(N*log(N)) time complexity and O(N) memory. :cold_sweat:

Does there exists an API or a combination of fast operations to achieve this? :thinking:

(When the point dimensionality equals to 1, the result can be directly obtained by torch.Tensor.sort().indices, if I choose the largest value as the starting point)