Vectorize for loop in variable length of indexes

Hi, I would like to know if there is any better way to vectorize following calculation instead of for loops:

I have a 2d matrix A and an index matrix B (only 0 and 1) of the same shape. For each column of A, I want to take the row index from B where B is 1, then take the corresponding rows from A, then do another for loop of an iterative sum:
for i in row_index: A[1: i] = A[1: i] * w1 + A[0: i-1] * w2

And the key point is that the index in each column of B might be of different lengths.

I would like to know if there is any way to better vectorize above instead of doing two for loops. Any idea is appreciated, thanks!

this looks recurrent,e.g. for scalar cells:


if I got the computation correctly, it is not parallelizable.

However outer loop should be vectorizable, as columns are independent. Basic idea is to convert indexes into a mask, in your case this means expanding w1 and w2 into 2d arrays, and masking by w1=1,w2=0. After that you should be able to process all columns at once.

1 Like