Torch.distributed.init_process_group

rony · February 15, 2019, 9:53pm

I am new to pytorch and unable to understand how Torch.distributed.init_process_group works. I am unable to understand the The documentation and not getting anything good over Google. Please help me.

rony · February 17, 2019, 2:26pm

Any help will be appreciated. Please

RicCu · February 17, 2019, 3:04pm

Hi, check out this this tutorial.

rony · February 17, 2019, 5:12pm

https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Distributed , This link will open a project where a file multiproc.py is calling through, subprocess.Popen() the main.py file. Where the init_process_group() method is initialized and torch.nn.parallel.DistributedDataParallel() method is used. Can you explain me the pipeline here? How the different copies of model in different GPU’s are handled and how the data parallelism is handled by these two scripts?