What is the difference between dist.all_reduce_multigpu and dist.all_reduce

Rakshith_V · June 2, 2022, 6:16am

In single-node multi-gpu setup I have used dist.all_reduce. Will it work for multi-node multi-gpu setup ? or should I have to use dist.all_reduce_multigpu ?
In general, what;s the difference between both ?

wanchaol · June 7, 2022, 5:08am

Thanks for posting the question @Rakshith_V. multigpu version of collectives are being used when you have one rank manages multigpus, say if you have 2 nodes, each with 8 gpus, but you construct a world_size=2 processes, each process manages 8 gpus, you might need these collectives to do operations.

But if you construct a world_size = 16 processes, you can use all_reduce instead of all_reduce_multigpu for sure.

see the doc here Distributed communication package - torch.distributed — PyTorch 1.11.0 documentation