Error when using DDP on multiply gpus

mrshenli · June 25, 2020, 3:22pm

Can you share a minimum repro of train_net.py, especially how you call init_process_group and DistributedDataParallel ctor?

BTW, could you please add a “distributed” tag for future torch.distributed-related posts? So that the PT distributed team can get back to you promptly.