I’m trying to run 2 gpu with torch tranning.
from apex.parallel import DistributedDataParallel as DDP
#multi gpu
os.environ["CUDA_VISIBLE_DEVICES"] = '0, 1'
torch.distributed.init_process_group(backend='nccl',
init_method='env://')
#model = nn.DataParallel(model, output_device=1)
model = DDP(model, delay_allreduce=True)
I added this init_process_group and DDP part
- this is not running without error
- and I dont know what backend and init_method mean