How to fastly seperate a tensor into k subtensor with a specific range?

Here is a problem about how to sort the origin tensor into k subtensor with a restricted value range.
The example is shown below.
There is a list from 1 to 8.

[1,5,3,2,6,7,8,4]

I want to split it into 2 subtensors like,

[1,4,3,2]
[6,7,8,5]

I only care about the order between the subtensors but not in the subtensors.
I hope to find a effienct way to do it.
One way is to find kthvalue of the tensor by using torch.kthvalue and use torch.mask_select to find it.
But torch.kthvalue could be high cost when tensor size is too big.
The other way is to use shell sort. But there are no efficient shell sort in python.
Could you please give me some suggestions?

What is the speed difference you are seeing with this approach compared to e.g., calling topk multiple times?

Actually if I call torch.topk, I need to do it multiple times.
And if I want to seperate the tensor into four subtensors. k should first be set to total_size/4. And then I should mask or somehow delete the elements chose in the first run.
Like

input = torch.rand([16])
src1, index1 = torch.topk(input, k = 4,sorted = False)
input = input.index_fill(index1, -1*e9)
...

Is there a way to select the medium part of the tensor, please?

Did you compare the timing of torch.sort() with this method of calling torch.topk() multiple times?
How does torch.sort() perform compared to this method?

Actually I have tested torch.sort.
It performs better than torch.topk when I want to separate tensor to more subtensors.
But it is still so slow at cpu.
Acutally I am confused that a tensor with size12845056. Sorting it needs about 2seconds.
Both torch.topk and torch.sort costs a lot more than a linear or a convolutional layer.
Why this happens