MPI Allreduce Queue


What is the function/reason to enqueue(std::move(entry)); in ProcessGroupMPI Allreduce? I don’t understand this part of the code

This is where the operation is queued to be executed by a background thread. All collective calls execute on a separate thread so you can block on their completion only when you need the result. This allows for overlapping of gradient reduction with gradient computation.

1 Like

When you talk about “overlapping of gradient reduction with gradient computation” you mean the overlapping between forward and backward propagation?

No, it is overlapped with gradient computation (backward propagation) only. As more and more gradients have been computed, they are ready to be reduced. There is no need to wait with reducing them until you have computed all of them.

1 Like

@pietern Hi,could you tell ne why ProcessGroupNCCL not use enqueue(xxx) function?Does NCCL implement multiple nodes gradients calculation synchronization?