ncclInvalidUsage of torch.nn.parallel.DistributedDataParallel

fangwei123456 (Fangwei123456) November 1, 2021, 9:33am 8

I sove this problem by change
net.to(f'cuda:{args.local_rank}')

1 Like

Properly implementing DDP in training loop with cleanup, barrier, and its expected output