Hi, I am wondering why the src
means differently in torch.distributed._functional_collectives.broadcast
and torch.distributed.distributed_c10d.broadcast
. In functional version, src
is the local rank inside the group, while in the non-functional version, src
is a global rank. This is kind of misleading to users that first use functional version.
Agree that is not ideal, but because torch.distributed.distributed_c10d.broadcast
is the original collective we can’t change it for backwards compatibility reasons. The functional version must always pass in a group so it can use the local rank
1 Like