MPI Allreduce Queue

hpc-unex · March 29, 2019, 11:32am

Hello,

What is the function/reason to enqueue(std::move(entry)); in ProcessGroupMPI Allreduce? I don’t understand this part of the code

pietern · March 29, 2019, 4:40pm

This is where the operation is queued to be executed by a background thread. All collective calls execute on a separate thread so you can block on their completion only when you need the result. This allows for overlapping of gradient reduction with gradient computation.

hpc-unex · April 2, 2019, 9:43am

When you talk about “overlapping of gradient reduction with gradient computation” you mean the overlapping between forward and backward propagation?

pietern · April 2, 2019, 5:21pm

No, it is overlapped with gradient computation (backward propagation) only. As more and more gradients have been computed, they are ready to be reduced. There is no need to wait with reducing them until you have computed all of them.

qiuyang · August 16, 2019, 7:54am

@pietern Hi,could you tell ne why ProcessGroupNCCL not use enqueue(xxx) function？Does NCCL implement multiple nodes gradients calculation synchronization?