Hi! I have a sparse indexing operation that I want to accelerate. Here is a dummy example:
x = torch.rand(10)
indices = [2,3,4,6,7]
out = x[indices]
As you can see, some indices are consecutive.
In my real world use case, it may make sense to group indices together into ranges:
ranges = [(2,5), (6,8)] # tuples (start, end) like with range()
out = torch.cat([x[a:b] for a, b in ranges])
My rationale for this is to reduce the number of memcpy
operations (or CUDA equivalent) used behind the scenes. Written naively like in the above snippet, it’s unsurprisingly way slower.
I tried writing a cpp extension for this:
torch::Tensor select_ranges(torch::Tensor arr, std::vector<std::pair<int, int>> ranges) {
std::vector<torch::Tensor> chunks;
for (auto range : ranges) {
chunks.push_back(arr.slice(0, range.first, range.second));
}
return torch::cat(chunks);
}
…but it’s even slower!
Do you have any suggestions for improving this?
Or can you convince me that it’s a bad idea altogether?
Thanks!