Complex indexing to avoid for-loop on GPU

Hello,

I am trying to perform complex indexing to avoid using for-loops on GPU. My problem is as follows, I have a large matrix X = [B, N, H], a smaller matrix H = [B, N, M, H] (which is initialized with as zeros), and a list containing indices L, where each list has the coordinates l = [b, n, m], where B: batch size, N: number of tokens in sequence, M: number of sub_tokens - M is a subset of N, H: hidden dimension.

Specifically, I want to populate H with samples from X; however, not every N in H samples M times. Also there is no sampling across the batch B. Let me illustrate with an example.

X = tensor(

              [[[ 0,  1,  2,  3],
                [ 4,  5,  6,  7],
                [ 8,  9, 10, 11]],

               [[12, 13, 14, 15],
                [16, 17, 18, 19],
                [20, 21, 22, 23]]]

)

Index = [[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 2, 0], [0, 2, 1], [1, 0, 0], [1, 1, 1], [1, 2, 1], [1, 2, 2]]

H = tensor(

              [[[[0, 1, 2, 3],
                 [4, 5, 6, 7]],

                [[0, 1, 2, 3],
                 [0, 0, 0, 0]],

                [[0, 1, 2, 3],
                 [4, 5, 6, 7]]],

              [[[12, 13, 14, 15],
                [0, 0, 0, 0]],

               [[16, 17, 18, 19],
                [0, 0, 0, 0]],

              [[16, 17, 18, 19],
               [20, 21, 22, 23]]]])

If anyone has an idea how to solve this that would be amazing!

Thank you,
Christoph

Hi,

I found a different solution that tackles the problem from a different angle and does not require complex indexing as mentioned in the post.

Thanks.