Suppose I have a machine with 8 GPUs and 64 CPUs.
Using DistributedDataParallel, it will run 8 processes and each process uses a single GPU. I’m wondering how about CPUs? Are they evenly distributed across the 8 processes? Can we specify which process use how many CPUs?
Hey @zzzf, DDP does not do any specific thing to allocate CPUs.
cc @ptrblck is it possible to pin CPU affinity for a PyTorch process
I’ve seen approaches to set the CPU affinity for a GPU device using
nvml as described here.
However, I don’t know if and how this approach would work for a general PyTorch process and if you would benefit from it.
@pritamdamania87 also suggested
@zzzf please let us know if these solutions would help. Thx!
Thanks for your reply. These are very helpful!
Hi Shen, could you please also take a look at my latest post which is relevant to your tutorial: DistributedDataParallel: model weights and grads not synchronized with multiple forward backward pass? Thanks!
yep, commented there.
DDP was supposed to be used with alternating fw and bw passes. I am a little surprised that it didn’t throw any error. Please let us know the version of PyTorch you are using, we might have recently accidentally disabled the check for some code paths.