I checked the code in detectron2 and I found that they build new group for each machine. I haven’t learned much about the distributed system and I am just curious about why do they do that. I tried to search for it but I couldn’t find it. Can anyone explain this?
I would recommend reading the following documentation: Distributed communication package - torch.distributed — PyTorch 1.8.1 documentation