Hi, Everyone. I have encountered some problem about pytorch ddp on single node multiple gpus.
My setting is follow as:
os.environ["MASTER_PORT"] = "9999" os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3" ..... distributed_sampler = torch.utils.data.distributed.DistributedSampler(dataset) torch_dataloader = torch.utils.data.DataLoader(dataset, batch_size=64, pin_memory=True, num_workers=4, sampler=distributed_sampler) model.cuda() model = torch.nn.parallel.DistributedDataParallel(model)
But this setting is slower than
DataParallel, and get some message.
UserWarning: Single-Process Multi-GPU is not the recommended mode for DDP. In this mode, each DDP instance operates on multiple devices and creates multiple module replicas within one process. The overhead of scatter/gather and GIL contention in every forward pass can slow down training. Please consider using one DDP instance per device or per module replica by explicitly setting device_ids or CUDA_VISIBLE_DEVICES.
GCP ml-engine image_uri: gcr.io/cloud-ml-public/training/pytorch-gpu.1-7
gpu_type: complex_model_m_p100 (p100x4 on single node)
Hope someone can answer my problem. I will appreciate.