I have a simple problem (hopefully!) regarding parallelizing a part of my model. I have looked around but cannot seem to find a definitive answer on how to approach this although similar questions seem to have been asked.
TLDR: How do you do a parallel for loop across multiple CPUs or GPUs in the same computer in the middle of a gradient step?
What I have is multiple additional computations which I know are embarassingly parallelizable, but compute bound. Currently, I am calculating them sequentially in a for loop within a .fit() function. These results are accumulated and then combined to produce the final loss.
A code sketch is as follows:
for i in range(epochs): self.optimizer.zero_grad() loss = self.fit_get_loss() loss.backward(retain_graph=False) def fit_get_loss(): # This is the for loop to parallelize total_loss = 0 for j in range(self.N_extra_models): m1 = self.extra_models_1[j] m2 = self.extra_models_2[j] loss1 = m1.fit_get_loss(with_grad=True) loss2 = m2.fit_get_loss(with_grad=True) total_loss = total_loss + loss1 + loss2 return total_loss
I would like to distribute the computation across many CPUs (e.g. a workstation with 20 cores) such that , for example, i have 20 of those j iterations occuring in parallel and I just accumulate the loss value. This is preferrable to do on CPU given the hardware I currently have available. However I am also keen to know how to apply the sample problem to multiple GPUs.
Actually, the real problem I have is more complicated than the above, and I need to re-use the extra_models but I think the above is the simplest form of what i’m trying to achieve. The extension of the problem is that I would like to access the updated fitted values for each of those extra_models within the main optimization loop call.