Is dist.broadcast_multigpu a misnomer?

I am implementing multi-tensor support in a 3rd party distributed backend for collectives such as broadcast and allreduce. However, the only python API that accepts multi-tensor input is collectives with _multigpu suffix like dist.broadcast_multigpu.

Looking at the implementation of these (e.g. here), I don’t see anything specific to GPUs and it works equally well with the 3rd party distributed backend we have.

So, is the suffix _multigpu a misnormer?
If so, can we contribute to change it? Of course, this would have backward compatibility issues.

Thank you,

Hi @esaliya , thanks for your question!

I don’t think broadcast_multigpu is necessarily a misnomer since the intent of the name is to imply that the list of tensors passed into the API must have each tensor on a distinct GPU. For example, see this check in one of the C++ backend implementations: pytorch/ProcessGroupNCCL.cpp at master · pytorch/pytorch · GitHub

I do agree that to the implementor of the C++ backend, it is not clear that different tensors are expected as the input into ::broadcast. Is this something that is blocking your implementation of the 3rd-party backend?