Hello everyone. I used the mpi to run multiprocess and use the nccl backend with DDP, Is this a correct way that I use the mpi and nccl? I’d appreciate if anybody can help me! Thanks in advance!
here is my sample code:
import torch
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP
def dist_train(rank, size):
local_rank = int(os.environ['OMPI_COMM_WORLD_LOCAL_RANK'])
if args.gpu:
torch.cuda.set_device(local_rank)
# set torch device
device = torch.device("cuda" if args.gpu and torch.cuda.is_available() else "cpu")
model = model.to(device)
model = DDP(model, device_ids=[local_rank])
'''training code......'''
def init_process(rank, size, fn, backend='gloo'):
dist.init_process_group(backend, init_method='tcp://master_ip:port', rank=rank, world_size=size)
fn(rank, size)
world_size = int(os.environ['OMPI_COMM_WORLD_SIZE'])
world_rank = int(os.environ['OMPI_COMM_WORLD_RANK'])
init_process(world_rank, world_size, dist_train, backend='nccl')
My running command is : mpirun -np ${totals} -H ${slots} ${COMMON_MPI_PARAMETERS} python demo.py