CPU cores sampling, multi GPU training

How do I get started setting this up:

  1. Each CPU core holds its own unique buffer of samples.
  2. Each CPU samples from its own buffer of samples, and sends to corresponding GPU.
  3. Each GPU performs backpropagation N times. On the Nth backpropagation, the gradients are all reduced across GPUs and the models are sent back to each CPU core.