My code involves two stage, like
model_a -> do something and model_b -> do something
And I use DistributedDataParallel(DDP) to accelerate them rather than DP. So I do something like that:
model_a -> model_a = setup DDP and DDP(model_a) -> model_b -> setup DDP and model_b = DDP(model_b)
This will cause problem because you cannot open start two DDP in one process.
So I use dist.destroy_process_group(). But when I use this function, the program will wait for endless time.
Here are my start DDP and destroy DDP codes:
def setup(rank, world_size): os.environ['MASTER_ADDR'] = 'localhost' os.environ['MASTER_PORT'] = '12355' # initialize the process group dist.init_process_group("gloo", rank=rank, world_size=world_size) #dist.init_process_group("nccl", rank=rank, world_size=world_size) # Explicitly setting seed to make sure that models created in two processes # start from same random weights and biases. torch.manual_seed(42) def cleanup(): dist.destroy_process_group()
I’ve tried either gloo backend or nccl backend
For Pytorch 1.1, python 3.6, CUDA 9.0