Cpp extension for multiple slices in one operation

Hi! I have a sparse indexing operation that I want to accelerate. Here is a dummy example:

x = torch.rand(10)
indices = [2,3,4,6,7]
out = x[indices]

As you can see, some indices are consecutive.
In my real world use case, it may make sense to group indices together into ranges:

ranges = [(2,5), (6,8)]  # tuples (start, end) like with range()
out = torch.cat([x[a:b] for a, b in ranges])

My rationale for this is to reduce the number of memcpy operations (or CUDA equivalent) used behind the scenes. Written naively like in the above snippet, it’s unsurprisingly way slower.

I tried writing a cpp extension for this:

torch::Tensor select_ranges(torch::Tensor arr, std::vector<std::pair<int, int>> ranges) {
    std::vector<torch::Tensor> chunks;
    for (auto range : ranges) {
        chunks.push_back(arr.slice(0, range.first, range.second));
    }
    return torch::cat(chunks);
}

…but it’s even slower!

Do you have any suggestions for improving this?
Or can you convince me that it’s a bad idea altogether? :slight_smile:
Thanks!

1 Like