Hello,

I am trying to perform complex indexing to avoid using for-loops on GPU. My problem is as follows, I have a large matrix X = [B, N, H], a smaller matrix H = [B, N, M, H] (which is initialized with as zeros), and a list containing indices L, where each list has the coordinates l = [b, n, m], where B: batch size, N: number of tokens in sequence, M: number of sub_tokens - M is a subset of N, H: hidden dimension.

Specifically, I want to populate H with samples from X; however, not every N in H samples M times. Also there is no sampling across the batch B. Let me illustrate with an example.

X = tensor(

```
[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]]
```

)

Index = [[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 2, 0], [0, 2, 1], [1, 0, 0], [1, 1, 1], [1, 2, 1], [1, 2, 2]]

H = tensor(

```
[[[[0, 1, 2, 3],
[4, 5, 6, 7]],
[[0, 1, 2, 3],
[0, 0, 0, 0]],
[[0, 1, 2, 3],
[4, 5, 6, 7]]],
[[[12, 13, 14, 15],
[0, 0, 0, 0]],
[[16, 17, 18, 19],
[0, 0, 0, 0]],
[[16, 17, 18, 19],
[20, 21, 22, 23]]]])
```

If anyone has an idea how to solve this that would be amazing!

Thank you,

Christoph