Hello everyone!
I’m trying to implement the following function as efficiently as possible:
Given I have a sorted 1D tensor of positive integers, I want to retrieve the index of the first occurrence of each value. In case any integer between 0 and N-1 is missing, I want to return a -1. Thus, the function would return a 1D tensor of N elements.
Is there an efficient way of doing this better than just a for loop over all values?

E.g.

x = torch.tensor([2, 3, 3, 4, 6, 6, 6, 8, 8], dtype=torch.long)
y = arg_first_ocurrence(x, N=10)

would return

>>> y
tensor([-1, -1, 0, 1, 3, -1, 4, -1, 7, -1])

I’m open to solutions that would involve coding a C++/CUDA extension.

I don’t know whether this meets your efficiency requirements, but
here is a scheme that uses no loops, only pytorch tensor operations.

Note that msk = x == lbl.unsqueeze (1) uses broadcasting,
and therefore generates a two-dimensional tensor from your
one-dimensional input, possibly introducing inefficiency.

Also for convenience, I’ve eliminated the N argument, instead just
using the length of the input tensor.