Efficient and Fast way to traverse an array and compute mean

Hello,

I have a problem that I solved. However, my solution is extremely slow. If someone could help me speed up the solution I would be extremely grateful.

So I am trying to compute the average of certain elements in an array using indices of another array. Please note that the same indices may exist multiple times. The average is saved in a smaller dimension array.
Here is an illustrative example:

result = torch.full(( 32, 256, 256), -1).to(device)
data = torch.full(( 32, 40000), 1).to(device) #example of 
count = np.zeros(256,256)
indices = np.array([[1,2],[0,0],[1,2],[1,1],[1,0],[0,0],[1,2],[2,2],[2,2],[0,0]])

for i, indx in enumerate(indices):
        if(count[indx [0], indx [1]] == 0):
            result[:, indx [0], indx [1]] = data [:, i]
        else:
            result[:, indx [0], indx [1]] += data [:, i]
        count[indx [0], indx [1]] += 1

mask_zero = (count == 0)
count[mask_zero] = 1
count = torch.Tensor(count).to(device)
average= result/count

Is it possible to speed this?
This is done for only one batch, is it even possible to generalize it to multiple batches, where the sizes of the same arrays are like:

result = torch.full((batch, 32, 256, 256), -1).to(device)
data = np.arange(320).reshape(batch,32,10)
count = np.zeros(batch,256,256)
indices shape is (batch, 10, 2)

I have spent enormous amount of time trying to optimize this solution, but couldnt. Help is greatly appreciated.
Thanks in advance.

I’m sure there are better approaches and you could speed it up more, but you could use index_put_ with the accumulate=True argument instead of the for loop and precompute some indices.
Here is a code snippet:

def fun1():
    result = torch.full(( 32, 256, 256), -1.)
    data = torch.full(( 32, 40000), 1.)
    count = torch.zeros((256,256))
    indices = torch.tensor(([[1,2],[0,0],[1,2],[1,1],[1,0],[0,0],[1,2],[2,2],[2,2],[0,0]]))
    
    for i, indx in enumerate(indices):
            if(count[indx [0], indx [1]] == 0):
                result[:, indx [0], indx [1]] = data [:, i]
            else:
                result[:, indx [0], indx [1]] += data [:, i]
            count[indx [0], indx [1]] += 1
    
    #mask_zero = (count == 0)
    #count[mask_zero] = 1
    #count = torch.Tensor(count)
    #average= result/count
    uidx, ucount = indices.unique(dim=0, return_counts=True)
    result[:, uidx[:, 0], uidx[:, 1]] /= ucount
    return result#average


def fun2():
    result2 = torch.full((32, 256, 256), -1.)
    data = torch.full((32, 40000), 1.)
    indices = torch.tensor([[1,2],[0,0],[1,2],[1,1],[1,0],[0,0],[1,2],[2,2],[2,2],[0,0]])
    
    # prime result2
    result2[:, indices[:, 0], indices[:, 1]] = 0
    
    # precompute fast_idx
    fast_idx = torch.arange(32).unsqueeze(1).expand(-1, 10).transpose(1, 0).reshape(-1)
    
    result2.index_put_(
        (fast_idx,
         indices[:, 0].unsqueeze(1).expand(-1, 32).reshape(-1),
         indices[:, 1].unsqueeze(1).expand(-1, 32).reshape(-1)),
        data[fast_idx, torch.arange(10).unsqueeze(1).expand(-1, 32).reshape(-1)],
        accumulate=True)
    
    uidx, ucount = indices.unique(dim=0, return_counts=True)
    result2[:, uidx[:, 0], uidx[:, 1]] /= ucount
    return result2

avg1 = fun1()
avg2 = fun2()
print((avg1 - avg2).abs().max())
> tensor(0.)

%timeit fun1()
> 6.32 ms ± 754 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit fun2()
> 5.75 ms ± 134 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

which could potentially speed up your method.
Note that I removed the slower count calculation from your code and replaced it with the unique method, which yields a speedup on my setup.

2 Likes

Thank you for your help! Indeed I gained good speedup.

What if we level up the problem to include multiple batches (say 8) instead of just one. Noting that each batch can have different number of effective indicies. the ineffictive indicies can be filled with dummy data like (-1). Example


indices_0 = torch.tensor(([[1,2],[0,0],[1,2],[1,1],[1,0],[0,0],[1,2],[2,2],[2,2],[0,0]]))
indices_1 = torch.tensor(([[1,0],[0,0],[-1,-1],[-1,-1],[1,1],[0,0],[1,0],[0,0],[0,1],[0,0]]))
indices_2 = torch.tensor(([[-1,-1],[0,0],[3,1],[0,0],[0,0],[1,1],[1,1],[1,1],[1,2],[2,0]]))
...
indices_7 = torch.tensor(([[2,2],[0,1],[2,2],[1,1],[1,2],[0,0],[1,0],[1,2],[2,1],[0,1]]))

Here [-1,-1] is to be ignored. I indicated the ineffective indices with [-1,-1] just to spot them out. I can eliminate them from the tensor, but then I wont be able to put them in one tensor of shape (8,10,2) as they are of different lengths, unless there is a way around this too.

Those are 8 indices each representing a batch. The shape of all tensors will be as follows:
Whenever there is an ineffective (not to be counted) index then its corresponding data value will be [-1,-1,…,-1]. Again the -1’s are there to make all tensors of equal lengths.

result = torch.full((8, 32, 256, 256), -1).to(device)
data = torch.rand((8,32,40000))
indices shape is: (8, 10, 2)

Hi,

Is there a way to use average pooling instead of summing and then taking the average?

Thanks.