I am trying to train N independant models using M GPUs in parallel on one machine. What I currently want to achieve is training the N models, M at a time in parallel for given number of epochs, store the intermediate return output of each model until all are done, process the stored outputs and repeat for a number of rounds.
client has a
device property with a GPU id, and model parameters are assigned to the device before training. The
device_dict dictionary has one key for each gpu containing a list of client ids assigned to the device. Here is what I have implemented so far (untested) and am unsure if that is the best way of doing this.
def train_mp(self, num_rounds, train_epochs): # Initialize logit queue for server update after each round logit_queue = Queue() for _ in num_rounds: self.round += 1 diffusion_seed = self.server.generate_seed() server_logit = self.server.get_logit() processes =  # Start processes for each client on each device for i in range(math.ceil(self.num_clients / self.num_devices)): for device, client_ids in self.device_dict.items(): if i < len(client_ids): process = mp.Process(target=self.client_update, args=(self.clients[client_ids[i]], server_logit, diffusion_seed, logit_queue)) process.start() processes.append(process) # Wait for all processes to finish for process in processes: process.join() # Update server model with client logit queue self.server.knowledge_distillation(logit_queue)
I currently do not have access to a multi-GPU machine to test anything so am unsure what the best way of doing this would be. Any help would be appreciated.