How do I get started setting this up:
- Each CPU core holds its own unique buffer of samples.
- Each CPU samples from its own buffer of samples, and sends to corresponding GPU.
- Each GPU performs backpropagation N times. On the Nth backpropagation, the gradients are all reduced across GPUs and the models are sent back to each CPU core.