Hi,
I have 4 GPUs for training.
CUDA_VISIBLE_DEVICES=0,1,2,3 WORLD_SIZE=4 python -m torch.distributed.launch --nproc_per_node=4 --master_port 49611 train.py
I know that spawn will initialize the function of its input parameter.
when I use the following code to initialize the training, which function will be launch multiple times?
when will local_rank 0, 1, 2 or 3 be set respectively?
Is there any doc/link about how init_process_group
function initialize functions?
def trainer():
# code
if args.distributed:
torch.cuda.set_device(args.local_rank)
device = torch.device('cuda:{}'.format(args.local_rank))
torch.distributed.init_process_group(backend='nccl', init_method='env://')
args.world_size = torch.distributed.get_world_size()
args.rank = torch.distributed.get_rank()
else:
_logger.info('Training with a single process on 1 GPU.')
assert args.rank >= 0
```