I have N C-dimensional points to form a tensor `Data`

with a shape of `(C, N)`

. Given a starting point with its index in `Data`

(e.g. 0), I want to implement an efficient greedy Hamiltonian path construction algorithm as follows:

```
1. Initialize a list S = [], and the remaining set R = {all N data points};
2. Pick the starting point from R to S;
3. Using a loop to pick points one-by-one from R to S, in each round,
find the nearest point in R from the last point in S (measured by L2
distance), move it out from R and append to S;
4. When R is empty, return S and end.
```

where each point is represented by its corresponding index in `Data`

(i.e. a number in {0,1, …, N-1}), and the returned S is actually a permutation which can be applied on `Data`

to get the ordered result.

It’s relatively easy to implement the above algorithm by for-loops, but I am pursuing a more efficient way based on PyTorch. The trivial implementation could be with O(N^2) time complexity, and I’m wondering if this can be achieved with O(N*log(N)) time complexity and O(N) memory.

Does there exists an API or a combination of fast operations to achieve this?

(When the point dimensionality equals to 1, the result can be directly obtained by `torch.Tensor.sort().indices`

, if I choose the largest value as the starting point)