Two models in distributed data parallel

mf2055 · February 20, 2025, 6:33pm

Hi, I have two models with the following characteristic and I am trying to implement DPP.

Model 1: a CNN type network
Model 2: input is the encoder feature maps from Model 1

The gradient for the two models are calculated using the overall loss loss, and back propagated using their respective optimisers.

Overall Loss = Loss 1 + Loss 2
Loss 1: Model 1’s Cross entropy and dice loss
Loss 2: Calculated by comparing Model 2 prediction and Loss 1 output

In this case, I want both of the model to be on the same GPU during training, since they are somewhat dependent on each other.

Can we achieve this by creating the DDP environment once, and then wrapping the model using the DDP wrapper and specifying the same local rank for both network? (code below)

os.environ[‘MASTER_ADDR’] = ‘localhost’
os.environ[‘MASTER_PORT’] = port # str(port)
torch.cuda.set_device(torch.device(‘cuda’, dist.get_rank()))
dist.init_process_group(“nccl”, rank=rank, world_size=world_size)

self.Model 1= DDP(self.Model 1, device_ids=[self.local_rank])
self.Model 2=DDP(self.Model 2,device_ids=[self.local_rank])

Please advise me on this!

Thank you!