Difference Between broadcast_coalesced and dist_broadcast_coalesced

I want to have experiments on my own strategy of distributed data parallel and I have read through the codes in torch.distributed. But I am confused by the following two C++ function. It seems that these two methods are doing the same job, namely broadcasting the tensors from a tensor from one gpu to the others.

My PyTorch version is v1.0.0.

  1. broadcast_coalesced in torch/csrc/cuda/comm.cpp
  2. distBroadcastCoalesced in torch/csrc/csrc/distributed/c10d/ddp.cpp

I am wondering the difference of usages between these two methods. Thanks!

broadcast_coalesced is used in a single process situation, when one process controls multiple gpus. distBroadcastCoalesced is used when there are multiple processes, and each process makes this call.

FWIW, the function in ddp.cpp should be considered a private API, only to be used by DistributedDataParallel. For the upcoming release, it was removed and replaced with a function also called broadcast_coalesced in the c10d namespace. See https://github.com/pytorch/pytorch/blob/caa0d0c/torch/csrc/distributed/c10d/comm.h. See https://github.com/pytorch/pytorch/pull/20234 for the PR that added it.