So I’m trying to run train_net.py in Windows. I have a machine with 4 GPUs. This works fine in Linux but not in Windows. In Windows I get:
raise RuntimeError(“No rendezvous handler for {}://”.format(result.scheme))
RuntimeError: No rendezvous handler for tcp://
Does this have anything to do with nccl vs gloo? Can I force a gloo backend to get DDP working in detectron2?
Are there changes I would have to make to get this working in Windows?
I believe it would involve changes to detectron2\engine\launch.py
Would it be something like:
dist.init_process_group(backend=“gloo”, init_method=file:///c:/libtmp/test.txt, world_size=world_size, rank=global_rank)