Gradients update


As I see in the code, there is a Queue used by background threads in order to communicate the parameters with Allreduce. My question is how this parameters are updated. Allreduce is a blocking collective so the background threads will be waiting until all of them enqueue their parameters. So, maybe there is the possibility that parameters arent updated so a process isnt taking into account sometimes the parameters of the others process which have different data. Am I right? Does this affect to the precision ? How the parameter update work then? Whats the point of using a Queue?

Which code were you looking at?

In version 1.0 the distributed backend has been updated such that all collectives run asynchronously w.r.t. the main thread, even if they are blocking. For MPI collectives this means they are run on a single background thread. The queue approach was taken in PyTorch before version 1.0. In 1.1 we introduced a new C++ based gradient reduction mechanism (see reducer.cpp) that concatenates multiple gradients into a large bucket and performs allreduce against those buckets instead of individual parameters.