Hello everyone,
I was trying to use the PyTorch distributed package, however, I came across the following error
Traceback (most recent call last):
File "train_parallel_ch_classifier.py", line 385, in <module>
File "train_parallel_ch_classifier.py", line 385, in <module>
main(args)
File "train_parallel_ch_classifier.py", line 35, in main
main(args)
File "train_parallel_ch_classifier.py", line 35, in main
world_size = args.world_size)
File "/z/sw/packages/pytorch/0.2.0/lib/python2.7/site-packages/torch/distributed/__init__.py", line 46, in init_process_group
world_size = args.world_size)
File "/z/sw/packages/pytorch/0.2.0/lib/python2.7/site-packages/torch/distributed/__init__.py", line 46, in init_process_group
group_name, rank)
RuntimeErrorgroup_name, rank)
: world_size was not set in config at /z/tmp/build/pytorch-0.2.0/torch/lib/THD/process_group/General.cpp:17
RuntimeError: world_size was not set in config at /z/tmp/build/pytorch 0.2.0/torch/lib/THD/process_group/General.cpp:17
I am using python 2.7, with PyTorch 0.2 installed from source. Below is how I initialize
dist.init_process_group(backend = 'gloo',
init_method = '/z/home/mbanani/nonexistant',
world_size = args.world_size)
Any thoughts on what may be causing this or how I can fix it ?
Thank you