For example, two process contains torch.distributed.reduce(...)
, the process which will block in this function until the other process execute this function too, then they all returns, we can say that they are synchronized in this function.
I would like to know are the collective function of torch.cuda.comm or torch.distributed synchronized actually.