Difference Between broadcast_coalesced and dist_broadcast_coalesced

I want to have experiments on my own strategy of distributed data parallel and I have read through the codes in torch.distributed. But I am confused by the following two C++ function. It seems that these two methods are doing the same job, namely broadcasting the tensors from a tensor from one gpu to the others.

My PyTorch version is v1.0.0.

  1. broadcast_coalesced in torch/csrc/cuda/comm.cpp
  2. distBroadcastCoalesced in torch/csrc/csrc/distributed/c10d/ddp.cpp

I am wondering the difference of usages between these two methods. Thanks!

broadcast_coalesced is used in a single process situation, when one process controls multiple gpus. distBroadcastCoalesced is used when there are multiple processes, and each process makes this call.

1 Like

FWIW, the function in ddp.cpp should be considered a private API, only to be used by DistributedDataParallel. For the upcoming release, it was removed and replaced with a function also called broadcast_coalesced in the c10d namespace. See https://github.com/pytorch/pytorch/blob/caa0d0c/torch/csrc/distributed/c10d/comm.h. See https://github.com/pytorch/pytorch/pull/20234 for the PR that added it.