We can send tensors to different devices. to(device=“cpu:x”)
But how do we get the “cpu:x” device?
The setting is, using MPI to spawn multiple threads for computations (not backprop) and then moving tensors from each thread to a central GPU (for backprop) and back to each CPU.
More broadly, the question can be posed as:
In deep reinforcement learning, if I have multiple CPU cores sampling data from a simulator,
The gradients for these databatches must be passed either serially or synchronously (I’m not sure what the GPU can do) to the GPU, and then the post-gradient-update weights should be sent back to the CPU so that the CPU can use those weights for the new policy to get more samples from the environment.
The problem is I have no idea how to do the above in code. Maybe it needs to be like this:
Every CPU thread holds a copy of the model.
After sampling is done, send the samples and copy of the model to the GPU for backpropagation.
Send model back to CPU.
Step 2 could probably be done by just doing loss.to(device=“cuda:0”) before backpropagation… Hmm… I’ll test this out if there’s no existing examples