I am trying to train N independant models using M GPUs in parallel on one machine. What I currently want to achieve is training the N models, M at a time in parallel for given number of epochs, store the intermediate return output of each model until all are done, process the stored outputs and repeat for a number of rounds.
Each client
has a device
property with a GPU id, and model parameters are assigned to the device before training. The device_dict
dictionary has one key for each gpu containing a list of client ids assigned to the device. Here is what I have implemented so far (untested) and am unsure if that is the best way of doing this.
def train_mp(self, num_rounds, train_epochs):
# Initialize logit queue for server update after each round
logit_queue = Queue()
for _ in num_rounds:
self.round += 1
diffusion_seed = self.server.generate_seed()
server_logit = self.server.get_logit()
processes = []
# Start processes for each client on each device
for i in range(math.ceil(self.num_clients / self.num_devices)):
for device, client_ids in self.device_dict.items():
if i < len(client_ids):
process = mp.Process(target=self.client_update, args=(self.clients[client_ids[i]], server_logit, diffusion_seed, logit_queue))
process.start()
processes.append(process)
# Wait for all processes to finish
for process in processes:
process.join()
# Update server model with client logit queue
self.server.knowledge_distillation(logit_queue)
I currently do not have access to a multi-GPU machine to test anything so am unsure what the best way of doing this would be. Any help would be appreciated.