I want to have experiments on my own strategy of distributed data parallel and I have read through the codes in torch.distributed. But I am confused by the following two C++ function. It seems that these two methods are doing the same job, namely broadcasting the tensors from a tensor from one gpu to the others.
My PyTorch version is v1.0.0.
broadcast_coalesced in torch/csrc/cuda/comm.cpp
distBroadcastCoalesced in torch/csrc/csrc/distributed/c10d/ddp.cpp
I am wondering the difference of usages between these two methods. Thanks!
broadcast_coalesced is used in a single process situation, when one process controls multiple gpus. distBroadcastCoalesced is used when there are multiple processes, and each process makes this call.