Suppose I have a tensor x of shape (B, U, T), and another tensor t of shape (B, U, 2). The last dimension in t denotes start and end indices for the corresponding T dimension of x. I want to create a flattened tensor y which contains only the “selected” values of x, where the selection is based on the ranges given in t. Assume that I know the total size of y in advance (this is easy to compute as y_size = (t[:, :, 1] - t[:, :, 0]).sum()).

A naive loop based implement would look something like this:

offset = 0
for b in range(B):
for u in range(U):
start, end = t[b, u]
y[offset : offset + end - start] = x[b, u, start:end]
offset += end-start

Of course, this is slow since the loop runs B*U times. I am trying to come up with a more efficient way of doing this, perhaps using index_select(), but I am stuck.

I would appreciate any help if someone can think of a faster solution.

I was able to come up with the following mask based approach, but perhaps there is a more efficient way.

import torch.nn.functional as F
indices = torch.zeros_like(x)
indices += F.one_hot(t[..., 0], num_classes=T)
indices -= F.one_hot(t[..., 1], num_classes=T)
indices = torch.cumsum(indices, dim=-1).bool()
y = torch.masked_select(x, indices)

Thanks for your suggestion. It is indeed more readable than using one-hot and cumsum. I will try both methods and see if there it makes a difference in the throughput.

Update: After trying out both methods, I found that the method suggested by @KFrank is ~10% slower overall compared to the cumsum. I think this may be because it requires T point-wise comparisons, which is slower than just setting 2 values and doing cumsum on a single tensor.