# Efficient and Fast way to traverse an array and compute mean

Hello,

I have a problem that I solved. However, my solution is extremely slow. If someone could help me speed up the solution I would be extremely grateful.

So I am trying to compute the average of certain elements in an array using indices of another array. Please note that the same indices may exist multiple times. The average is saved in a smaller dimension array.
Here is an illustrative example:

``````result = torch.full(( 32, 256, 256), -1).to(device)
data = torch.full(( 32, 40000), 1).to(device) #example of
count = np.zeros(256,256)
indices = np.array([[1,2],[0,0],[1,2],[1,1],[1,0],[0,0],[1,2],[2,2],[2,2],[0,0]])

for i, indx in enumerate(indices):
if(count[indx , indx ] == 0):
result[:, indx , indx ] = data [:, i]
else:
result[:, indx , indx ] += data [:, i]
count[indx , indx ] += 1

count = torch.Tensor(count).to(device)
average= result/count

``````

Is it possible to speed this?
This is done for only one batch, is it even possible to generalize it to multiple batches, where the sizes of the same arrays are like:

``````result = torch.full((batch, 32, 256, 256), -1).to(device)
data = np.arange(320).reshape(batch,32,10)
count = np.zeros(batch,256,256)
indices shape is (batch, 10, 2)
``````

I have spent enormous amount of time trying to optimize this solution, but couldnt. Help is greatly appreciated.

I’m sure there are better approaches and you could speed it up more, but you could use `index_put_` with the `accumulate=True` argument instead of the for loop and precompute some indices.
Here is a code snippet:

``````def fun1():
result = torch.full(( 32, 256, 256), -1.)
data = torch.full(( 32, 40000), 1.)
count = torch.zeros((256,256))
indices = torch.tensor(([[1,2],[0,0],[1,2],[1,1],[1,0],[0,0],[1,2],[2,2],[2,2],[0,0]]))

for i, indx in enumerate(indices):
if(count[indx , indx ] == 0):
result[:, indx , indx ] = data [:, i]
else:
result[:, indx , indx ] += data [:, i]
count[indx , indx ] += 1

#count = torch.Tensor(count)
#average= result/count
uidx, ucount = indices.unique(dim=0, return_counts=True)
result[:, uidx[:, 0], uidx[:, 1]] /= ucount
return result#average

def fun2():
result2 = torch.full((32, 256, 256), -1.)
data = torch.full((32, 40000), 1.)
indices = torch.tensor([[1,2],[0,0],[1,2],[1,1],[1,0],[0,0],[1,2],[2,2],[2,2],[0,0]])

# prime result2
result2[:, indices[:, 0], indices[:, 1]] = 0

# precompute fast_idx
fast_idx = torch.arange(32).unsqueeze(1).expand(-1, 10).transpose(1, 0).reshape(-1)

result2.index_put_(
(fast_idx,
indices[:, 0].unsqueeze(1).expand(-1, 32).reshape(-1),
indices[:, 1].unsqueeze(1).expand(-1, 32).reshape(-1)),
data[fast_idx, torch.arange(10).unsqueeze(1).expand(-1, 32).reshape(-1)],
accumulate=True)

uidx, ucount = indices.unique(dim=0, return_counts=True)
result2[:, uidx[:, 0], uidx[:, 1]] /= ucount
return result2

avg1 = fun1()
avg2 = fun2()
print((avg1 - avg2).abs().max())
> tensor(0.)

%timeit fun1()
> 6.32 ms ± 754 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit fun2()
> 5.75 ms ± 134 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
``````

which could potentially speed up your method.
Note that I removed the slower `count` calculation from your code and replaced it with the `unique` method, which yields a speedup on my setup.

2 Likes

Thank you for your help! Indeed I gained good speedup.

What if we level up the problem to include multiple batches (say 8) instead of just one. Noting that each batch can have different number of effective indicies. the ineffictive indicies can be filled with dummy data like (-1). Example

``````
indices_0 = torch.tensor(([[1,2],[0,0],[1,2],[1,1],[1,0],[0,0],[1,2],[2,2],[2,2],[0,0]]))
indices_1 = torch.tensor(([[1,0],[0,0],[-1,-1],[-1,-1],[1,1],[0,0],[1,0],[0,0],[0,1],[0,0]]))
indices_2 = torch.tensor(([[-1,-1],[0,0],[3,1],[0,0],[0,0],[1,1],[1,1],[1,1],[1,2],[2,0]]))
...
indices_7 = torch.tensor(([[2,2],[0,1],[2,2],[1,1],[1,2],[0,0],[1,0],[1,2],[2,1],[0,1]]))
``````

Here [-1,-1] is to be ignored. I indicated the ineffective indices with [-1,-1] just to spot them out. I can eliminate them from the tensor, but then I wont be able to put them in one tensor of shape (8,10,2) as they are of different lengths, unless there is a way around this too.

Those are 8 indices each representing a batch. The shape of all tensors will be as follows:
Whenever there is an ineffective (not to be counted) index then its corresponding data value will be [-1,-1,…,-1]. Again the -1’s are there to make all tensors of equal lengths.

``````result = torch.full((8, 32, 256, 256), -1).to(device)
data = torch.rand((8,32,40000))
indices shape is: (8, 10, 2)
``````

Hi,

Is there a way to use average pooling instead of summing and then taking the average?

Thanks.