MPI backend cannot work!

lkywk · December 20, 2017, 12:20pm

I use the MPI as the backend in distributed pytorch among multiple nodes.

exec code:
mpirun --hostfile hostfile -n 2 python *.py

and I revised the init line which looks like init line in pytorch distributed_test.py file ,

the init command line
os.environ[‘MASTER_ADDR’] = '172.31.7.117’
os.environ[‘MASTER_PORT’] = ‘23456’

dist.init_process_group(init_method='env://', backend='mpi')
group = dist.new_group([i for i in range(dist.get_world_size())])

I am sure that openmpi is installed correctly. Pytorch is build from source with MPI support.

I really appreciate any help.