Error when using DDP on multiply gpus

Can you share a minimum repro of train_net.py, especially how you call init_process_group and DistributedDataParallel ctor?

BTW, could you please add a “distributed” tag for future torch.distributed-related posts? So that the PT distributed team can get back to you promptly.

1 Like