I’m curious about the implementation of parallelizing the for-loop in federated learning. The for-loop I am referring to is the one that iterates through all the selected clients for locally trained models between communication rounds. Most simulations loop through the selected clients serially, but I want this for-loop to be executed in parallel.
Specifically, given N GPUs and K clients, there are three possible relationships between N and K. Starting with the case N == K, I would like each GPU to train a model on one client’s data in parallel. To achieve this, I imagine creating N processes, with each process executing the training task. However, I have no idea where to start, so I would appreciate some guidance on this. Additionally, I am wondering if this problem could also be solved using DDP (DistributedDataParallel
).