Help me to speed up nested loop?

sudddddd · May 10, 2020, 4:20pm

I am using a nested for loop to get outputs from my model for 1000 images compared with themselves. model takes 2 images and outputs a scaler. A is a tensor of size (1000,3,128,128) and B=A. But, it is taking a long time to get the output.

out=[]
for i in A:
        temp=[]
        for j in B:
            temp.append(model(i,j))
        out.append(temp)

Final shape of out is (1000,1000). Above code just gives the overall idea of my task and 1000 is arbitrary.
I am actually using batch of size 8 instead of 1 (as above) to speed up the process. But, this is still taking a lot of time. Can anyone help me with this? I also want to know if there is a method to avoid duplicate comparisons as out matrix is symmetric.

ptrblck · May 11, 2020, 1:07am

Could you explain the comparison inside your model a bit?
I.e. if you pass two batches of indices [0, 1, 2] and [0, 1, 2], will the model output the scalars for 0-0, 1-1, 2-2` or will it also compute all other combinations?

In the former case, you could increase the inner batch (j) by one while increasing the outer batch by batch_size.

You could add a check for the indices as:

for index_i, i in enumerate(A):
    temp = []
    for index_j, j in enumerate(B):
        if j > i: # or j >= i, if you don't want to compare the same indices
            continue

sudddddd · May 11, 2020, 1:54am

If you pass my model the batch ([0,1,2],[0,1,2]), then it will output a scalar for each of the case 0-0, 1-1 and 2-2, i.e. my output will be 3 scalars.

As I have mentioned in my question, I am using batch size of 8. I replicate the i 8 times (for each element) and pass it to 8 elements of the inner batch, j. But, this is too slow. I want to optimise this part.

ptrblck:

You could add a check for the indices as:

for index_i, i in enumerate(A):
    temp = []
    for index_j, j in enumerate(B):
        if j > i: # or j >= i, if you don't want to compare the same indices
            continue

Yes, I could do that, but, is there a way to combine this with the previous optimisation (if found) to further speed up the task.

ptrblck · May 11, 2020, 1:59am

I’m not sure if there is another way to optimize this.
Assuming you are dealing with N images, your model would have to calculate
N + N-1 + N-2 + ... + 1 = (N+1) * N / 2 comparisons.
You could speed it up using batches, but if I understand your explanation correctly, you are avoiding unnecessary calculations.