Parallel For Loop for parallelized sub computation in a gradient step


I have a simple problem (hopefully!) regarding parallelizing a part of my model. I have looked around but cannot seem to find a definitive answer on how to approach this although similar questions seem to have been asked.

TLDR: How do you do a parallel for loop across multiple CPUs or GPUs in the same computer in the middle of a gradient step?

What I have is multiple additional computations which I know are embarassingly parallelizable, but compute bound. Currently, I am calculating them sequentially in a for loop within a .fit() function. These results are accumulated and then combined to produce the final loss.

A code sketch is as follows:

for i in range(epochs):  
     loss = self.fit_get_loss()

def fit_get_loss():
     # This is the for loop to parallelize
     total_loss = 0
     for j in range(self.N_extra_models):
         m1 = self.extra_models_1[j]
         m2 = self.extra_models_2[j]
         loss1 = m1.fit_get_loss(with_grad=True)
         loss2 = m2.fit_get_loss(with_grad=True)
         total_loss = total_loss + loss1 + loss2
     return total_loss

I would like to distribute the computation across many CPUs (e.g. a workstation with 20 cores) such that , for example, i have 20 of those j iterations occuring in parallel and I just accumulate the loss value. This is preferrable to do on CPU given the hardware I currently have available. However I am also keen to know how to apply the sample problem to multiple GPUs.

Actually, the real problem I have is more complicated than the above, and I need to re-use the extra_models but I think the above is the simplest form of what i’m trying to achieve. The extension of the problem is that I would like to access the updated fitted values for each of those extra_models within the main optimization loop call.

thanks again


You can use python’s native threading to achieve this. I am not very familiar with it :confused:

Keep in mind though that if you perform big enough operations in pytorch’s operations, it will use multiple threads automatically.
If you just have many many python code to run. multiple threads won’t help you, because you can only run one thread at a time in python (you can google for the GIL).