Share gradients in multiprocessing

I am wondering if there is a way to share gradients in multiprocessing.
Currently, I am following the [hogwild] ( example to run the asynchronous training. The shared model is the CPU. The local model is on the GPU and has their own or shared optimizer. Each local model can update the shared model directly.
I want to modify the current case to the synchronous version. For example, each local model can upload their gradients to the shared model. After aggregating a certain amount the gradients, the shared model can update itself. However, in this example [hogwild] (, the gradient part of the shared model is not shared with other processes.


I want to do the exact same thing (as my multiprocesses have local models on multiple GPUs) but could not figure that out yet. There are some insights in the following link where the gradients of the local model can be copied, vica versa. I guess that it should be possible to add gradients from local model to the global shared model before making the optimizer step. Please, let me also know if you figure it out.

Did you figure it out?