Hello,

I have a problem that I solved. However, my solution is extremely slow. If someone could help me speed up the solution I would be extremely grateful.

So I am trying to compute the average of certain elements in an array using indices of another array. Please note that the same indices may exist multiple times. The average is saved in a smaller dimension array.

Here is an illustrative example:

```
result = torch.full(( 32, 256, 256), -1).to(device)
data = torch.full(( 32, 40000), 1).to(device) #example of
count = np.zeros(256,256)
indices = np.array([[1,2],[0,0],[1,2],[1,1],[1,0],[0,0],[1,2],[2,2],[2,2],[0,0]])
for i, indx in enumerate(indices):
if(count[indx [0], indx [1]] == 0):
result[:, indx [0], indx [1]] = data [:, i]
else:
result[:, indx [0], indx [1]] += data [:, i]
count[indx [0], indx [1]] += 1
mask_zero = (count == 0)
count[mask_zero] = 1
count = torch.Tensor(count).to(device)
average= result/count
```

Is it possible to speed this?

This is done for only one batch, is it even possible to generalize it to multiple batches, where the sizes of the same arrays are like:

```
result = torch.full((batch, 32, 256, 256), -1).to(device)
data = np.arange(320).reshape(batch,32,10)
count = np.zeros(batch,256,256)
indices shape is (batch, 10, 2)
```

I have spent enormous amount of time trying to optimize this solution, but couldnt. Help is greatly appreciated.

Thanks in advance.